
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
1 Center for Human Genomics and 2 Department of Public Health Sciences, Wake Forest University School of Medicine, Winston-Salem, North Carolina; 3 Translational Genomics Research Institute, Phoenix, Arizona; 4 Computational Genetics Laboratory, Dartmouth Medical School, Lebanon, New Hampshire; 5 Department of Urology, Johns Hopkins Medical Institutions, Baltimore, Maryland; 6 Oncology, Department of Radiation Sciences, University of Umeå, Umeå; and 7 Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
Requests for reprints: Jianfeng Xu, Center for Human Genomics, Wake Forest University School of Medicine, Medical Center Boulevard, Winston-Salem, NC 27157. Phone: 336-713-7500; Fax: 336-713-7566. E-mail: jxu{at}wfubmc.edu
| Abstract |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
In an attempt to fill this gap and to explore the joint effect of multiple sequence variants on prostate cancer risk, we designed a study to systematically evaluate a large number of sequence variants among multiple genes in the inflammation pathway in a large prostate cancer case-control study population. In addition to assessing a main effect on prostate cancer risk for each sequence variant, we explored the joint effects of multiple sequence variants using a data mining method, multifactor dimensionality reduction (MDR). We systematically evaluated the ability of this approach to classify and predict which individuals were affected with prostate cancer based on any combination of two, three, or four variants from all the genotyped variants. We found that the interaction of four inflammation pathway genes significantly predicts prostate cancer risk.
| Materials and Methods |
|---|
|
|
|---|
|
Genotyping Methods
Genotyping was done using the MassARRAY system (Sequenom, Inc., San Diego, CA). For the MassARRAY assay, PCR and extension primers for sequence variants were designed using SpectroDesigner software (Sequenom). The primer information is available at the corresponding author's web site (http://www.wfubmc.edu/genomics). PCR and extension reactions were done according to the manufacturer's instructions, and extension product sizes were determined by mass spectrometry.
Statistical Analysis
A Hardy-Weinberg Equilibrium test was done for each SNP using the Fisher probability test statistic (4), as implemented in the software package Genetic Data Analysis. Empirical P values for the Hardy-Weinberg equilibrium test were based on 10,000 permutation tests.
The MDR method was first described by Moore and colleagues (5-8). Briefly, this method is designed to improve the identification of factors associated with disease risk by reducing the dimensionality of multifactor information. The method involves several steps: in the first step, the data were divided into a training set (consisting of 9/10 of the data) and an independent testing set (consisting of the remaining 1/10 of the data) as part of cross-validation. In the second step, a set of n factors (in this case, SNPs) were selected, where n = 1, 2, 3, and 4. In steps 3 and 4, the n SNPs and their possible multifactor classes are represented in n dimensional space, e.g., for two SNPs with three genotypes each, there are nine possible twolocus-genotype combinations. The ratio for the number of cases to the number of controls was calculated within each multifactor class. Each multifactor class in n dimensional space was then labeled as "high risk" if the case to control ratio met or exceeded a threshold (for example, 1.0), or as "low risk" if that threshold was not exceeded, thus reducing the n dimensional space to one dimension with two levels (low risk and high risk). In the fifth step, the model that gave the lowest misclassification error (error in classifying cases and controls based on high risk or low risk in the training set) was selected for each set of n SNPs. In step six, a prediction error (error in classifying disease status in the testing set) was estimated for each model selected in step 5, as a cross-validation procedure. Steps 1 to 6 were repeated 10 times using a random seed number. We did this entire 10-fold cross-validation procedure 10 times, using different random seed numbers, to reduce the chance of observing spurious results due to chance divisions of the data. In addition to the misclassification error and prediction error, we also estimated a cross-validation consistency, defined as a percentage of the same combination of SNPs selected as the best model among different cross-validation data sets, for each set of n SNPs.
We determined the statistical significance of the observed prediction error of the best model for each set of n SNPs by empirical simulations as described below. We first generated a data set with no association between prostate cancer and SNPs by randomly permuting case and control status among the CAPS subjects. We then did the above 10-fold cross-validation MDR analysis for each generated data. We repeated these two steps 1,000 times for each set of n SNPs. Empirical P values were based on the number of prediction errors estimated among the 1,000 simulations that were as small as or smaller than the observed prediction errors. The simulations were done using a 1,024-CPU IBM supercomputer cluster.
To decrease the effect of missing data on the results, we removed SNPs with
5% missing data. We also removed the subjects with missing data on 10 or more SNPs (28 cases and 10 controls). Furthermore, to decrease the effect of strong linkage disequilibrium between SNPs in the same gene on the MDR analysis, when SNPs were in strong pair-wise LD, defined as D' > 0.8, one of the pair was randomly dropped. Inclusion of SNPs that are highly correlated may lead to unstable results because MDR analyses report only the best predictor (SNP) and these highly correlated SNPs may compete for the best predictor. However, removing highly correlated SNPs may increase the chance of detecting a possible haplotype effect, or a cis effect of two or more functional SNPs in a single gene. Among the 104 SNPs of 20 genes genotyped in this CAPS population, 57 SNPs were included in the MDR analysis. Finally, we are aware that an unbalanced number of cases and controls may affect the results of MDR analyses. Therefore, to decrease the effect of an unbalanced number of cases and controls on the MDR results, we randomly selected 585 men from the control pool of 770 subjects, with replacement, to obtain a balanced number of cases and controls for the MDR analysis. This approach, although fully utilizing the genotype information of cases, may introduce an extra-correlation among the controls. However, the use of a cross-validation procedure to estimate prediction error and the use of a permutation procedure to determine significance levels in our analyses may relieve this concern to some degree.
| Results and Discussion |
|---|
|
|
|---|
|
2 test for allele frequency difference between cases and controls. The allele frequency for the minor allele "C" of this SNP was significantly higher in cases (0.24) than in controls (0.20), P = 0.002, and was the most significant among these 57 SNPs. When SNPs were considered two at a time, the SNPs from MIC1 (rs1058587) and TLR5 (IIPGA-5187) had the highest cross-validation consistency (51%) and the lowest classification error (44.16%) and prediction error (46.21%) among all the possible combinations of two SNPs. As presented in Table 3, subjects with five combinations of genotypes had a high risk of prostate cancer. These five risk genotypes do not follow simple dominant, recessive, or additive models for any alleles of the two SNPs. The prediction error was not statistically significant, with an empirical P = 0.12 based on 1,000 permutations. Evidently, the ability to predict prostate cancer status using this two-SNP model was improved over the one-SNP model described above.
|
When SNPs were considered four at a time, the SNPs from IL-10 (rs1800896), IL-1RN (rs878972), TIRAP (14115), and TLR5 (IIPGA-5187) had the highest cross-validation consistency (56%) and the lowest prediction error (43.28%) among all the possible combinations of four SNPs. The prediction error was statistically significant, with an empirical P = 0.019 based on 1,000 permutations. Although this prediction error is far from a perfect 0%, it is an important improvement from the a priori 50% chance in predicting prostate cancer status. Forty-three combinations of genotypes of these four SNPs had a high risk for prostate cancer (data not shown). These 43 combinations again did not follow simple dominant, recessive, or additive models for any alleles of the four SNPs. When these four SNPs were examined one at a time using a
2 test for allele frequency difference, only the SNP in TIRAP (14115) had a significantly different allele frequency between cases and controls (P = 0.04), whereas no significant differences in the allele frequencies between cases and controls were observed for the SNPs in IL-10 (P = 0.28), IL-1RN (P = 0.35), and TLR5 (P = 0.28).
The hypothesis that multiple genes are involved in the predisposition to prostate cancer is well supported by our understanding of the biology of prostate cancer development and by observational data from epidemiologic and genetic epidemiologic studies. With our ability to systematically genotype haplotype-tagging SNPs in 20 genes among thousands of subjects, and the availability of the MDR method and the computing power required to model high-order interactions, our study represents the first major attempt to explore the effects of gene-gene interaction on prostate cancer risk. The large, homogeneous, and epidemiologically sound study population increases the likelihood that our findings represent a true interaction effect between these four genes on prostate cancer risk.
Several bodies of evidence suggest that a gene-gene interaction plays a role in susceptibility to common human diseases (9). First, the idea of gene-gene interactions has been around for nearly 100 years. The observed deviations from Mendelian ratios suggested interactions between genes. Second, the ubiquity of biomolecular interactions in gene regulation and biochemical and metabolic systems suggests that relationships between DNA sequence variants and clinical end points are likely to involve gene-gene interactions. Third, positive results from studies of single polymorphisms typically do not replicate across independent samples. This is true for both linkage and association studies. Fourth, gene-gene interactions are commonly found when properly investigated. For example, Nelson and colleagues (10) simultaneously considered multiple polymorphic loci to identify combinations of genotypes that are most strongly associated with variation in triglycerides using a combinatorial partitioning method. They identified nonadditive epistatic interactions between multiple loci in the absence of independent main effects. If gene-gene interactions play roles in the risk for common diseases, it suggests that we need a research strategy for identifying common disease susceptibility genes that embraces, rather than ignores, the complexity of the genotype to phenotype relationship (11).
Moore and colleagues introduced the MDR method as a way to reduce the dimensionality of multilocus information, in order to improve the identification of polymorphism combinations associated with disease risk. The MDR method is model-free (i.e., it assumes no particular inheritance model), and is directly applicable to case-control and discordant-sib-pair studies. Using simulated case-control data, they showed that MDR has reasonable power to identify interactions among two or more loci in relatively small samples. When this was applied to a sporadic breast cancer case-control data set, in the absence of any statistically significant independent main effects, MDR identified a statistically significant high-order interaction among four polymorphisms from three different estrogen-metabolism genes, COMT, CYP1B1, and CYP1A1 (5). Similar results have been observed for other common diseases such as atrial fibrillation (12), type II diabetes (13), and essential hypertension (14). The MDR method is an example of the type of analytic retooling that is needed for common disease research (11).
Exploring the effect of gene-gene interaction on prostate cancer risk among genes in the inflammation pathway is relevant. Chronic or recurrent inflammation has been implicated in the initiation and development of multiple human cancers, including those affecting the stomach, liver, colon, and urinary bladder (15, 16), and a role for chronic inflammation in the etiology of prostate cancer has been proposed (17-20). The fact that two of the three prostate cancer susceptibility genes (MSR1 and RNASEL) identified through positional cloning approaches are involved in innate immunity and inflammation has suggested a further link between inflammation and prostate cancer (21, 22). Sequence variants of genes in the inflammation pathway may affect the hosts' ability to regulate inflammation responses and may ultimately modify prostate cancer risk. If a sequence variant itself is sufficient to confer an increased risk to prostate cancer, it can be detected by comparing the frequency of the variant in cases and controls, assuming that there is a sufficient number of subjects. This may be one of the explanations for our observations of prostate cancer association with sequence variants in the TLR4 gene (2). The TLR6-TLR1-TLR10 gene cluster (23), and MIC1 (24) in the CAPS population. On the other hand, if a sequence variant confers an increased risk to prostate cancer only in the presence of other risk variants, they can only be detected when these variants are studied simultaneously by modeling gene-gene interactions. The four-gene interaction identified from this study is consistent with this scenario. Among four implicated SNPs of four genes, only the SNP in TIRAP (14115) had a significantly different allele frequency between cases and controls, whereas no significant differences in the allele frequencies between cases and controls were observed for the other three SNPs.
An interaction between TLR5, IL-1RN, TIRAP, and IL-10 is biologically plausible. TLR5 and IL-1 receptors recognize and bind bacteria, viruses, and other ligands. IL-1RN is a protein that binds to IL-1 receptors and inhibits the binding of IL-1
and IL-1ß. The engagement of ligands on these receptors initiates a series of downstream signaling cascades, including adaptor proteins such as TIRAP. The union of adaptor molecules with receptors leads to the activation of IL-1R-associated kinase (IRAK), and results in the production of various pro- or antiinflammatory cytokines such as IL-10. Therefore, sequence variants in these genes may interact, in a complex fashion, to regulate physiologic and pathophysiologic immune and inflammatory responses and modify prostate cancer risk.
Examining the 43 SNP combinations of the four genes that increased prostate cancer risk, no simple pattern of dominant, recessive, or additive effects of any alleles can be inferred. The complexity of the interactions between these genes makes it difficult to detect these interactions through modeling interaction terms in conventional logistic regression analyses, for several reasons. First, any SNPs that do not impart a main effect are likely to be missed in the logistic regression. For example, the SNPs in IL-10, IL-1RN, and TLR5 would not typically be included in most logistic regression analyses because they did not show a main effect. Second, with four SNPs, there will be many contingency table cells that have few or no data points. This will lead to variable estimates that have very large SEs resulting in an increased type I error (25). Third, the lack of a simple pattern of dominant, recessive, or additive action of alleles makes it nearly impossible to model the interaction terms. With these caveats, we retrospectively modeled the main and interaction effects using a logistic regression for these four SNPs. Four main effects (additive model), and six pair-wise interactions between the four SNPs were modeled. An interaction between IL-10 (rs1800896) and TIRAP (14115) was statistically significant (P = 0.016). No main effect or other interaction was statistically significant. The advantage of the MDR approach is that we can effectively classify individuals into high- or low-risk groups based on the genotype at each SNP without knowing the mechanisms of the interaction. The power and advantages of the MDR approach in identifying risk genes and in predicting prostate cancer risk will be more prominent as the number of genes increases.
It is worth noting that the best four-SNP interaction model did not include the genes that were previously identified to be associated with prostate cancer risk using single SNP analysis (2, 23, 24). This is not surprising because the four-SNP model considerably improved the ability to predict the disease risk (43.28% prediction error) compared with a single SNP (47.41% prediction error). The results from the MDR analysis were not in conflict with our previous single-gene analysis. When SNP's were considered one at a time, we found a SNP in TLR1 to be the best predictor for prostate cancer, similar to the results of our single-gene analysis (23). A comparison of the results from our current study and our previous single-gene studies shows the advantage of considering multiple SNPs in genetic association studies.
As a data mining approach, it is important to note that the results are suggestive and should be subjected to confirmation. The 1,000 permutation test suggested that the identified four-gene interaction is unlikely due to chance; however, confidence in this result will be increased if it can be confirmed. A confirmatory test is planned for the second phase of the CAPS study, in which an additional set of >1,000 cases and 1,000 controls are being recruited using the same protocol and criteria as in the first phase of CAPS.
In summary, using the MDR method to explore the effect of gene-gene interactions among many genes in the inflammation pathway on prostate cancer risk in a large case-control population, we have identified a four-gene interaction that significantly predicts prostate cancer risk. Whereas the ability to predict prostate cancer status is limited with a 43.28% prediction error, our ability to analyze a large number of SNPs in a large sample size is one of the first efforts in exploring the effect of high-order gene-gene interactions on prostate cancer risk, and this is an important contribution to this new and quickly evolving field. Future studies that include additional genes and environmental factors in a systematic assessment using methods such as MDR will likely improve upon this prediction error.
| Acknowledgments |
|---|
| Footnotes |
|---|
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Received 5/18/05; revised 8/ 3/05; accepted 8/26/05.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
J. R. Stark, F. Wiklund, H. Gronberg, F. Schumacher, J. A. Sinnott, M. J. Stampfer, L. A. Mucci, and P. Kraft Toll-like Receptor Signaling Pathway Variants and Prostate Cancer Mortality Cancer Epidemiol. Biomarkers Prev., June 1, 2009; 18(6): 1859 - 1863. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Chen, A. M. Kamat, M. Huang, H.B. Grossman, C. P. Dinney, S. P. Lerner, X. Wu, and J. Gu High-order interactions among genetic polymorphisms in nucleotide excision repair pathway genes and smoking in modulating bladder cancer risk Carcinogenesis, October 1, 2007; 28(10): 2160 - 2165. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. W Hsing, L. C Sakoda, and S. C Chua Jr Obesity, metabolic syndrome, and prostate cancer Am. J. Clinical Nutrition, September 1, 2007; 86(3): 843S - 857S. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Kury, B. Buecher, S. Robiou-du-Pont, C. Scoul, V. Sebille, H. Colman, C. Le Houerou, T. Le Neel, J. Bourdon, R. Faroux, et al. Combinations of Cytochrome P450 Gene Polymorphisms Enhancing the Risk for Sporadic Colorectal Cancer Related to Red Meat Consumption Cancer Epidemiol. Biomarkers Prev., July 1, 2007; 16(7): 1460 - 1467. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Liu, F. R. Schumacher, S. J. Plummer, E. Jorgenson, G. Casey, and J. S. Witte trans-Fatty acid intake and increased risk of advanced prostate cancer: modification by RNASEL R462Q variant Carcinogenesis, June 1, 2007; 28(6): 1232 - 1236. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Zhu, M. Lai, H. Yang, J. Lin, M. Huang, H.B. Grossman, C. P. Dinney, and X. Wu Genotypes, haplotypes and diplotypes of XPC and risk of bladder cancer Carcinogenesis, March 1, 2007; 28(3): 698 - 703. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Cancer Research | Clinical Cancer Research |
| Cancer Epidemiology Biomarkers & Prevention | Molecular Cancer Therapeutics |
| Molecular Cancer Research | Cancer Prevention Research |
| Cancer Prevention Journals Portal | Cancer Reviews Online |
| Annual Meeting Education Book | Meeting Abstracts Online |