
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
1 Fred Hutchinson Cancer Research Center, Seattle, WA and 2 Rosetta Inpharmatics, LLC, Merck Research Laboratories, Kirkland, WA
Requests for reprints: John D. Potter, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue N, MP-900, PO Box 19024, Seattle, WA 98109. E-mail: jpotter{at}fhcrc.org
| Abstract |
|---|
|
|
|---|
Key Words: microarray gene expression leukocytes human smoking cotinine
| Introduction |
|---|
|
|
|---|
Environmental exposures influence a variety of biological processes in quite distinct ways (e.g., enzyme induction, oxidation, signal transduction, etc.). Many of these responses also influence gene expression. Therefore, given a sufficiently large set of biological data to interpret, signatures or patterns of gene expression might emerge that would allow identification and even quantitation of specific exposures. These complex patterns of gene expression can be measured by DNA microarrays (1). One of the most readily obtainable biological materials from the general population is blood. Although peripheral leukocytes see only some environmental exposures and have a limited repertoire of responses, they are exposed, nonetheless, to many of the same environmental agents to which target tissues are exposed. Thus, blood is a useful and convenient biological material in which to seek exposure signatures.
We hypothesized that peripheral leukocyte mRNA expression could be used as a sensor to detect environmental exposures in observational studies. To test this hypothesis, we measured the mRNA signatures of individuals exposed and unexposed to tobacco smoke on the basis that: (a) we could verify smoking exposure by both self-report and the measurement of plasma cotinine concentrations; and (b) if tobacco smoking did not yield a detectable and repeatable signature, other more subtle environmental exposures were even less likely to do so. We also reasoned that if we could detect a signature for tobacco, other exposures, behaviors, and characteristics could also be explored for characteristic signatures.
| Methods |
|---|
|
|
|---|
Sample and Data Collection
Each participant was scheduled for two blood draws, 1 week apart. Blood samples were collected in the morning, between 7 and 10 a.m., after a 10-h fast. Two 10-ml blood samples were collected into heparin-containing tubes for mRNA and additional samples were collected into an EDTA-containing tubes for plasma cotinine and WBC differential count. Data on demographics, diet and exercise, and smoking history were also obtained. Blood samples collected at visit 1 were analyzed for plasma cotinine and nicotine concentrations by gas chromatography/nitrogen phosphate detection (National Medical Services, Willow Grove, PA). Cotinine, the primary metabolite of nicotine, has a longer circulating half-life than the parent compound and is used widely as a marker of tobacco use (2). WBC differential counts were determined on a Sysmex NE-8000 hematology analyzer.
RNA Isolation
Blood samples for mRNA were processed within 3 h of being drawn. Following lysis of erythrocytes and removal of cell debris, leukocytes were isolated from whole blood after dextran separation. Total RNA was isolated using TRIzol reagent (Invitrogen, Carlsbad, CA) and extracted according to manufacturer's protocol with one adjustment: following addition of isopropanol, RNA was precipitated on ice (4°C) for at least 10 min instead of 1530°C for 10 min. The pellet was resuspended in 100 µl Total RNase/DNase-free water.
RNA was purified with RNase-free DNase Sets and RNeasy Kits (Qiagen, Valencia, CA). All spins were performed at 8000 rpm for 1 min (Eppendorf centrifuge, model 5415C, Brinkmann Instruments, Westbury, NY) unless noted otherwise. Total RNA was mixed with 350 µl of RLT buffer with ß-mercaptoethanol (Sigma, St. Louis, MD) and 250 µl 100% ethanol (Aaper Alcohol and Chemical Co., Shelbyville, KY). This mixture was transferred to an RNeasy Mini column and centrifuged. Flowthrough was reloaded back into the column and centrifuged once again. The column was washed with 350 µl RW1 buffer. DNase I mixture (10 µl DNase I and 70 µl RDD buffer per sample) was added directly onto the membrane, and the column was left at room temperature for 15 min. Following DNase I treatment, the column was treated with additional RW1 buffer (350 µl) and centrifuged and then washed with 500 µl 80% ethanol. Following the wash, the column with bound RNA was centrifuged for 2 min at 14,000 rpm to remove any traces of ethanol. Total RNA was eluted into a collection tube with two subsequent 50-µl aliquots DEPC-treated water (centrifugation at 14,000 rpm for 1 min). Total RNA concentration was determined by A260nm reading on an Eppendorf BioPhotometer (Brinkmann Instruments).
cRNA Labeling and Expression Profiling
cDNA was produced from 5 µg total RNA by reverse transcription (RT) using Moloney murine leukemia virus (MMLV) RTase and then transcribed into cRNA by in vitro transcription (IVT) using T7 RNA polymerase. 5-(3-Aminoallyl)uridine 5'-triphosphate (Sigma) was incorporated into cRNA in the IVT reaction. For cRNA labeling, the allylamine-derivatized cRNA products were reacted with N-hydroxysuccinimide esters of Cy3 or Cy5 dyes (Amersham Pharmacia Biotech, Piscataway, NJ) as described (3). Five micrograms of Cy-labeled cRNA from one leukocyte sample were mixed with the same amount of reverse color Cy-labeled product from a pool, which consisted of an equal amount of cRNA from each of seven individuals (men and women; smoking status unknown) unrelated to the study participants. Arrays were run on samples collected at visit 1 and on a subset of samples collected from 17 individuals at visit 2. The resulting labeled probes were hybridized to hu25k oligonucleotide microarrays. All hybridizations were done in duplicate with fluor reversal on two microarrays to compensate for potential biases due to the different chemical properties of Cy3 and Cy5 dyes. The arrays were scanned to detect the level of gene expression for 21,000 genes as described previously (4).
Data Analysis
For the analyses, we used gene expression data from 65 individuals (32 smokers and 33 nonsmokers) as a training set to select and optimize a set of reporter genes. We used the array data for 20 other participants (10 smokers and 10 nonsmokers, equally distributed by sex) as a test set (TEST1) and the follow-up visit 2 samples for 17 of these 20 individuals as a separate test set (TEST2). Thus, TEST1 and TEST2 were derived from the same individuals at two time points, chosen as such to determine whether the signature was stable over the time.
This study relied on a cRNA pool of seven unrelated individuals rather than a pool of all of the study participants. The latter approach is often used in array work and affords greater power to discriminate between groups; however, it also requires that a sufficient amount of RNA be available from all participants to contribute to the pool. In our study, the smoker/nonsmoker discrimination power has the potential to be limited due to the gene signature resulting from the unrelated pool. To reduce this undesirable effect, we preprocessed the three sets (training set, TEST1, and TEST2) by subtracting the average gene expression value from each gene across individuals in the training set (n = 65). This preprocessing is similar to centering or re-ratioing of the samples to the newly formed mathematical pool formed as a geometric mean of all individual samples within the set. This established the same baseline across the three sets.
Using data from the training set, first, we selected a set of signature genes that satisfied the following criteria: |xdev| > 2.5 (P value
0.01) and abs(log10 ratio) > 0.3 in more than three individuals. The resulting 861 genes were pared down further to 857 by requiring them to have more than 95% of valid entries. These genes were ordered in descending fashion based on their absolute correlation coefficient to the plasma cotinine level. We began optimization of the reporter set by selecting the top 5 reporters and incrementing from the top of the list one reporter at a time. At each step, one profile was left out of our training set. The remaining 64 profiles were used to compute smoker and nonsmoker expression templates by averaging gene expression values across smokers and nonsmokers in the remaining set. The profile was classified as smoker or nonsmoker if it correlated more strongly to smokers or nonsmokers, respectively. This procedure was carried out for all 65 profiles in the training set and the total misclassification was computed as the sum of Type 1 and Type 2 errors. The optimization was designed to pick the set of reporters for which the total misclassification was minimal.
The reporters were then used to predict smoking status in the entire training set and test sets (TEST1 and TEST2) using the smoker and nonsmoker expression templates computed from the training set as described above. We determined the sensitivity (i.e., proportion of true positives correctly identified by the test) and specificity (i.e., proportion of true negatives that are correctly identified by the test) of the gene profile using cotinine as the gold standard to define smoking status. Because smoking is often associated with other behaviors (e.g., higher alcohol intake, lower exercise) which themselves have the capacity to influence gene expression, we attempted to determine whether these other exposures affected expression of the identified cotinine-associated genes. Given the small sample size, we used two approaches. First, we stratified individuals by sex, exercise, and aspirin use and determined whether we could discriminate between smokers and nonsmokers within the subgroups. Second, we determined the power of the reporter genes to predict differences in sex, exercise frequency, and aspirin use. (Additional data analysis details are provided in Appendix A.)
| Results |
|---|
|
|
|---|
|
|
|
|
|
|
20%) within strata based on sex, exercise, aspirin use, and alcohol intake. Second, we tested the power of the 36 reporter genes to predict exposures or behaviors plausibly associated (either directly or inversely) with smoking, for example, exercise, aspirin use, vitamin-supplement use, alcohol intake, vegetable intake. Whereas the 36 reporter genes differentiated smokers and nonsmokers with overall (Type 1 + Type 2) error rates of 5% and 6% in TEST1 and TEST2, respectively, using the data to predict each of the other exposures resulted in error rates of >30%; an exception was exercise for which overall error rates of 30% and 11% were observed for TEST1 and TEST2, respectively. Although these attempts to understand the relationships between smoking-associated exposures and identified smoking-related genes are rudimentary, these results show that plasma cotinine is a better predictor of the 36 reporter genes than is any other plausible variable. | Discussion |
|---|
|
|
|---|
In this study, we examined gene expression in total leukocytes, rather than specific cellular subsets. Total numbers of peripheral leukocytes characteristically have been shown to differ by smoking status (8, 9), and differential counts have been reported to differ by number of cigarettes smoked (8) or to remain unchanged (10). Recently, interindividual variation in expression patterns also has been shown to be influenced by peripheral leukocyte composition (11). Consequently, in theory, observed differences in gene expression between smokers and nonsmokers may reflect differences in the percentage of each cell type. In our sample, there were no significant differences in the major cell type distributions by smoking status (Table 1); thus, this factor is unlikely to be a major explanation for the differences we observed in gene expression.
Many of the reporter genes that were associated with plasma cotinine concentrations in our study have not been described in relation to cigarette smoking. However, several reporter genes, such as IL-1ß and CYP1B1, are associated with the pathophysiology of smoking-induced injury. Exposure to cigarette smoke is an oxidant burden, not only at the initial site of contact (i.e., respiratory epithelium), but also in peripheral leukocytes, where oxidative damage and polycyclic aromatic hydrocarbon-adducts are readily detectable in smokers (12, 13). Few studies have examined the effects of cigarette smoking on leukocyte gene expression in circulation; however, smoking has well-established immune and inflammatory effects in bronchial tissue and bronchoalveolar macrophages. Differences in cytokine profiles in bronchoalveolar lavage fluid, including IL-1ß, have been reported for smokers and nonsmokers (14, 15). Cigarette smoke up-regulates expression of proinflammatory cytokines such as IL-1ß, which in turn induce expression of a wide variety of genes. Similarly, expression of the biotransformation enzyme CYP1B1 is increased with active cigarette smoking and polycyclic aromatic hydrocarbon exposure in bronchoalveolar macrophages (16). Although one study, using quantitative reverse transcription-PCR, was unable to detect an effect of smoking on CYP1B1 mRNA levels in blood mononuclear cells (17), our results suggest that, with the higher number of cigarettes smoked by participants in our study, CYP1B1 in total leukocytes is responsive to cigarette smoke exposure.
The physiological and biochemical effects associated with cigarette smoking and the constituents of tobacco smoke support the relevance of several other reporter genes. For example, 6-phosphofructo-2-kinase/fructose-2,6-bisphosphatase-3 (PFKFB3), positively associated with plasma cotinine concentrations, is a key enzyme that regulates glycolysis in mammalian cells. It is induced by hypoxia and IL-1ß through stabilization of hypoxia-inducible factor-1
(18). Iduronate 2-sulfatase, an enzyme responsible for degradation of dermatan sulfate and heparan sulfate, constituents of mucopolysaccharides, was inversely associated with plasma cotinine. Reduced expression of iduronate 2-sulfatase could extend the half-life of mucins in smokers.
From a biological standpoint, many of the identified genes associated with smoking in this study, especially those related to immune function and inflammation, can be attributed readily to cigarette smoke exposure. The absence of data on the association between expression of particular genes and cigarette smoking may reflect a lack of information on some of the pathways in relation to smoke exposure; it also may reflect the complexity of studying the effect of tobacco smoke on gene expression in intact humans. First, the pathways affected by tobacco smoke are numerous and interconnected, and some genes may be influenced by the expression of other genes. Second, tobacco smoke is a complex mixture of compounds, each of which is likely to have multiple targets (2). Third, tobacco use clusters with other human behaviors, which, if not acknowledged and accounted for, leads to confounding and biased estimates of association. Thus, the expression profile associated with smoking may also be associated with a pattern of other behaviors (e.g., higher alcohol intake, lower exercise) that by themselves may influence gene expression. Nonetheless, these findings show that it is reasonable to explore whether these other exposures and behaviors also have readable signatures. Proper study of the relationships between exposures and gene expression will include both observational and experimental studies in humans. Such studies, optimally, will have sufficient power to detect the whole spectrum of differences and sufficient data on other exposures to allow proper control of confounding (5). Finally, our findings raise interesting, but as yet unresolved, implications for the use of readily obtained surrogate tissues in studying the relationship between exposures and biology relevant to the progression of human cancer.
| Appendix A |
|---|
|
|
|---|
I denote its expression log-ratio for person I = 1,...,N, N = 65 and its error, respectively. For the expression of each gene, one could compute the mean and its error across 65 individuals as
![]() |
![]() |
![]() |
![]() |
I are the expression and the error of the gene with respect to its average expression µ. The P value for the significance of expression is computed as
![]() |
![]() |
0.34. The originally selected 861 signature genes were correlated to the randomly permuted cotinine profile using 10,000 Monte Carlo runs. For each such run, we counted how many genes out of 861 had correlation coefficients greater or equal to 0.34. We then counted the number of runs that resulted in the number of genes greater or equal to 36the number of genes that had positive predictive value. One hundred ten of the 10,000 Monte Carlo runs resulted in
36 genes with a correlation coefficient
0.34, yielding a P value of 0.01 (figure).
![]() |
![]() |
![]() |
![]() |
SM and
NSM to smoker and nonsmoker templates, respectively. We can define the classification metric as
![]() |
|
| Acknowledgments |
|---|
| Footnotes |
|---|
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Note: J.W. Lampe, S.B. Stepaniants, and M. Mao contributed equally to this work.
Received 7/ 2/03; revised 10/16/03; accepted 11/12/03.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
I-M. Wang, S. Stepaniants, Y. Boie, J. R. Mortimer, B. Kennedy, M. Elliott, S. Hayashi, L. Loy, S. Coulter, S. Cervino, et al. Gene Expression Profiling in Patients with Chronic Obstructive Pulmonary Disease and Lung Cancer Am. J. Respir. Crit. Care Med., February 15, 2008; 177(4): 402 - 411. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Siest, E. Jeannesson, J.-B. Marteau, A. Samara, B. Marie, M. Pfister, and S. Visvikis-Siest Transcription Factor and Drug-Metabolizing Enzyme Gene Expression in Lymphocytes from Healthy Human Subjects Drug Metab. Dispos., January 1, 2008; 36(1): 182 - 189. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. M. Webb, M. A. Merritt, G. M. Boyle, and A. C. Green Microarrays and Epidemiology: Not the Beginning of the End but the End of the Beginning... Cancer Epidemiol. Biomarkers Prev., April 1, 2007; 16(4): 637 - 638. [Full Text] [PDF] |
||||
![]() |
D. M.v. Leeuwen, E. v. Agen, R. W.H. Gottschalk, R. Vlietinck, M. Gielen, M. H.M.v. Herwijnen, L. M. Maas, J. C.S. Kleinjans, and J. H.M.v. Delft Cigarette smoke-induced differential gene expression in blood cells from monozygotic twin pairs Carcinogenesis, March 1, 2007; 28(3): 691 - 697. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. C. Borczuk and C. A. Powell Expression Profiling and Lung Cancer Development Proceedings of the ATS, January 1, 2007; 4(1): 127 - 132. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. R. Sharp, H. Xu, L. Lit, W. Walker, M. Apperson, D. L. Gilbert, T. A. Glauser, B. Wong, A. Hershey, D.-Z. Liu, et al. The Future of Genomic Profiling of Neurological Diseases Using Blood Arch Neurol, November 1, 2006; 63(11): 1529 - 1536. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Osman, D. F. Bajorin, T.-T. Sun, H. Zhong, D. Douglas, J. Scattergood, R. Zheng, M. Han, K. W. Marshall, and C.-C. Liew Novel blood biomarkers of human urinary bladder cancer. Clin. Cancer Res., June 1, 2006; 12(11): 3374 - 3380. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Chon, M. C. Verhaar, H. A. Koomans, J. A. Joles, and B. Braam Role of Circulating Karyocytes in the Initiation and Progression of Atherosclerosis Hypertension, May 1, 2006; 47(5): 803 - 810. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Dumeaux, J. Johansen, A.-L. Borresen-Dale, and E. Lund Gene expression profiling of whole-blood samples from women exposed to hormone replacement therapy. Mol. Cancer Ther., April 1, 2006; 5(4): 868 - 876. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. P. Wild Complementing the Genome with an "Exposome": The Outstanding Challenge of Environmental Exposure Measurement in Molecular Epidemiology Cancer Epidemiol. Biomarkers Prev., August 1, 2005; 14(8): 1847 - 1850. [Full Text] [PDF] |
||||
![]() |
J.-B. Marteau, S. Mohr, M. Pfister, and S. Visvikis-Siest Collection and Storage of Human Blood Cells for mRNA Expression Profiling: A 15-Month Stability Study Clin. Chem., July 1, 2005; 51(7): 1250 - 1252. [Full Text] [PDF] |
||||
![]() |
D. M. van Leeuwen, R. W. H. Gottschalk, M. H. van Herwijnen, E. J. Moonen, J. C. S. Kleinjans, and J. H. M. van Delft Differential Gene Expression in Human Peripheral Blood Mononuclear Cells Induced by Cigarette Smoke and Its Constituents Toxicol. Sci., July 1, 2005; 86(1): 200 - 210. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. D. Fannin, J. T. Auman, M. E. Bruno, S. O. Sieber, S. M. Ward, C. J. Tucker, B. A. Merrick, and R. S. Paules Differential gene expression profiling in whole blood during acute systemic inflammation in lipopolysaccharide-treated rats Physiol Genomics, March 21, 2005; 21(1): 92 - 104. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Cancer Research | Clinical Cancer Research |
| Cancer Epidemiology Biomarkers & Prevention | Molecular Cancer Therapeutics |
| Molecular Cancer Research | Cancer Prevention Research |
| Cancer Prevention Journals Portal | Cancer Reviews Online |
| Annual Meeting Education Book | Meeting Abstracts Online |