Abstract
Background: Genetic association studies are generating much information, usually in the form of single nucleotide polymorphisms in candidate genes. Analyzing such data is challenging, and raises issues of multiple comparisons and potential falsepositive associations. Using data from a casecontrol study of bladder cancer, we showed how to use hierarchical modeling in genetic epidemiologic studies with multiple markers to control overestimation of effects and potential falsepositive associations. Methods: The data were first analyzed with the conventional approach of estimating each main effect individually. We subsequently employed hierarchical modeling by adding a second stage (prior) model that incorporated information on the potential function of the genes. We used an empiricalBayes approach, estimating the residual effects of the genes from the data. When the residual effect was set to zero, we instead used a semiBayes approach, in which they were prespecified. We also explored the impact of using different secondstage design matrices. Finally, we used two approaches for assessing geneenvironment interactions. The first approach added product terms into the firststage model. The second approach used three indicators for subjects exposed to geneonly, environmentonly, and both genetic and environmental factors. Results: By prespecifying the prior secondstage covariates, the estimates were shrunk to the mean of each pathway. The conventional model detected a number of positive associations, which were reduced with the hierarchical model. For example, the odds ratio for myeloperoxidase (G/G, G/A) genotype changed from 3.17 [95% confidence interval (CI), 1.327.59] to 1.64 (95% CI, 0.813.34). A similar phenomenon was observed for the geneenvironment interactions. The odds ratio for the geneenvironment interaction between tobacco smoking and Nacetyltransferase 1 fast genotype was 2.74 (95% CI, 0.6811.0) from the conventional analysis and 1.24 (95% CI, 0.801.93) from the hierarchical model. Conclusion: Adding a secondstage hierarchical modeling can reduce the likelihood of false positive via shrinkage toward the prior mean, improve the risk estimation by increasing the precision, and, therefore, represents an alternative to conventional methods for genetic association studies.
 hierarchical model
 genetic polymorphism
 geneenvironment interaction
Background
Cancer is a complex disease involving multiple genetic and environmental risk factors. To clarify the contribution of genetic factors and decipher the relationship between genes, environment, and cancer, association studies are generating much genetic information. This has often taken the form of studying single nucleotide polymorphisms in different genes. Analyzing such data is challenging, and raises the issues of multiple comparisons and potential falsepositive associations (1).
The common conventional analytic approaches currently in practice, such as (1) treating all exposures independently (i.e., estimating the parameters of interest one at a time), (2) including all exposures in a single model, and (3) using algebraic selection procedures, such as stepwise regression, each have limitations (2). The first approach ignores the biological relevance among the exposures. The second approach often results in overparameterized models, and the third approach selects the parameters arbitrarily based on “statistical significance” and can result in biased point and variance estimates (3). Furthermore, none of these approaches properly address the issues of multiple comparisons and potential falsepositive associations. In contrast, hierarchical modeling may provide an avenue for addressing these issues in genetic association studies. In particular, such studies generally collect much information on interrelated genetic variables, which can be incorporated into a hierarchical model.
A falsepositive association is suspected to occur when the point estimate is far away from other estimates, and unstable (i.e., with a large SE; ref. 4). This problem can be reduced by “correcting” the overestimation of the observed variance with the goal of estimating the “true” variance. The true variance (V_{T}) can be approximated by subtracting the mean of the variance of the individual estimates (V_{M}) from the observed sample variance of the log relative risk estimate (V_{O}; ref. 4). Because the variance should be nonnegative, the estimate of V_{T} is the maximum of V_{O} − V_{M} and zero.Hierarchical methods can be used to adjust the observed relative risk estimates by the estimated V_{T}. Typically this adjustment pulls the outlying relative risk estimates toward the geometric mean of the ensemble and leads to a narrower confidence interval (CI; because V_{T} is expected to be smaller than V_{O}; ref. 4). If the mean of the estimates are reasonably close to the mean of the true values, one should expect that the shrinkage to the mean of the estimates would decrease the total estimation errors [i.e., expected squared error (ESE) = bias^{2} + SD^{2}].
There are three assumptions when using this adjustment. First, there is no systematic bias in the conventional estimates that threatens the validity of the “shrinkage to the mean” (i.e., if the systematic bias is present, the estimates would be shrunk toward the “wrong mean” after adjustment). Second, the true values and random errors in each ensemble are both roughly normally distributed. And third, the true values of the relative risk within each ensemble are exchangeable. To satisfy the last assumption, the parameters must be grouped in a manner that minimizes the ESE (4). For example, consider a study of smoking, genetic polymorphisms, and lung cancer. If smoking (a strong risk factor) is included in one ensemble with multiple low penetrance genes, large but unstable estimates for a few genetic polymorphisms (which are more likely to be produced by sampling error) would not be pulled down, as would happen without smoking in the same ensemble.
The adjustment by shrinkage estimation can be undertaken with hierarchical modeling. This approach unifies the seemly separate concepts of frequentist and Bayesian analysis, and represents a valid alternative to a conventional analysis (5). It adds in two or more levels in the model to specify the relations among study variables. General properties of hierarchical modeling have been reviewed elsewhere (6, 7).
While numerous authors have advocated the advantages of using hierarchical modeling, they remain little used in the health sciences (710). Few other authors used hierarchical modeling to analyze genegene or geneenvironment interactions (11, 12). Here, we further demonstrate how hierarchical modeling can be applied to genetic epidemiologic studies, including several low penetrance genes and addressing geneenvironment interactions. The present study is different from the previous studies in that we employed a data set with multiple markers in different genes, and also provided a possible construction of the prior multivariate secondstage model for such a data set. To evaluate the performance of our model, we contrasted the results based on hierarchical modeling to those from conventional analyses. We used the data from a casecontrol study of bladder cancer, which is representative of current genetic epidemiologic studies, containing a modest sample size and multiple markers.
Materials and Methods
The Bladder Cancer CaseControl Study
A hospitalbased case control study of bladder cancer was conducted in Brescia, Northern Italy. Two hundred one male incident cases and 214 male controls with benign urological conditions were recruited from 1997 to 2000, frequency matched on age, period of recruitment, and hospital. Lifestyle and occupational information was collected by interview with a questionnaire. Occupational exposures to polycyclic aromatic hydrocarbons and aromatic amines were blindly coded by occupational physicians. Genotyping of glutathione Stransferase M1 (GSTM1) null, GSTT1 null, GSTP1 I105V, Nacetyltransferase 1 (NAT1) fast, NAT2 slow, cytochrome P450 1B1 (CYP1B1) V432L, sulfotransferase 1A1 (SULT1A1) R213H, myeloperoxidase (MPO) G463A, catecholOmethyltransferase (COMT) V108M, manganese superoxide dismutase (MnSOD) A9V, NAD(P)H:quinone oxidoreductase (NQO1) P187S, Xray repair crosscomplementing group 1 (XRCC1) R399Q, XRCC3 T241M, and xeroderma pigmentosum complementation group (XPD) K751Q polymorphisms was done with PCRRFLP methods. Detailed results from this study have been reported elsewhere (1315).
Hierarchical Modeling Overview
Hierarchical modeling extends and complements conventional analyses. In a twostage model, one can use a conventional logistic regression as the first stage:The coefficients β_{j} represent the effects of X_{j}, the exposures of interest. In our example, each coefficient represents the effect of particular genetic marker on cancer risk on a log scale. W is the matrix of the covariates (e.g., age) and γ is the vector of covariate coefficients.
Hierarchical modeling adds a second stage to the conventional model to try and improve the estimation of the β_{j}. For example, one could use the following linear modelwhere z_{j} is a row vector of p prior covariates, and π is a column vector of p prior coefficients. Z is the nbyp matrix of secondstage covariates, with rows z_{j}. The Z matrix is a key component of the hierarchical approach: it is defined by the investigators to reflect the similarities between the firststage factors. For example, in a casecontrol study of multiple dietary exposures and breast cancer risk (8), the Z matrix was based on the nutritional contents of the food items. In an analysis of genetic factors, it was based on physical distance between polymorphisms (16). In a study of geneenvironment interactions in polyp formation, the Z matrix incorporated the conversion rates for heterocyclic amines (12).
In Eq. C, π is the column vector of linear effects of the secondstage covariates (Z) on β_{j}, and δ_{j} are the residual effects of item j (e.g., specific genetic factors). Residual effects can arise from interactions among secondstage covariates, or from the covariates not included in the Z matrix, and are assumed to have mean of zero and variance of τ^{2}. The τ can either be estimated from the data [empiricalBayes (EB) approach] using a variance model based on the concept outlined above (Eq. A; refs. 2, 4, 17), or prespecified by the investigators using background information [semiBayes (SB) approach] (7, 17). τ^{2} reflects the range of the potential residual effect that remain after accounting for all first and secondlevel covariates, and can be viewed as an approximation of V_{T}.
Substituting Eq. C into B gives a mixed model Eq. (E), which contains fixed coefficients (π and γ) and random coefficients δ.
Application of Hierarchical Modeling to the CaseControl Study
For comparison's sake, the data were initially analyzed with the conventional approaches using the following two logistic regression models: treating all the genetic factors independently and constructing oneatatime models; and including all genetic factors in one model.
Our first hierarchical model was constructed under the assumption that all genes are exchangeable, with the EB approach (allowing τ^{2} to be estimated from the data). Because the assumption of full exchangeability is unlikely to reflect the underlying biological relations among genes, we further estimated the main effects of genetic polymorphisms by using a Z matrix to specify the similarity between genetic factors in terms of their function.
Construction of Z Matrix
The genetic factors were broadly classified into four main pathways related to bladder carcinogenesis, based on the biological functions of the genes: (1) reduced carcinogen detoxification; (2) enhanced procarcinogen activation; (3) increased oxidative stress; and (4) decreased DNA repair capacity. The risk genotype of each gene is designated based on the functional significance of the polymorphism. For each pathway, a score was assigned to each genetic polymorphism according to biological function of the gene and the functional significance of its polymorphism. A score of 0 was assigned to the genetic polymorphism if the gene is not involved in such pathway (the coefficient of the factor was expected to be zero). For example, MPO was assigned a score of 0 in the pathway of reduced detoxification and DNA repair, because it is not expected to have roles in such pathways; whereas a score of 1 was assigned if the gene is involved in such pathway, and the functional significances of the risk genotypes correspond to the pathway that contribute to the risk of cancer (the factor was expected to have a positive coefficient). Take XRCC1, for example, it has an important role in repair of single strand break. 399Gln allele was associated with higher levels of aflatoxin B1DNA adduct and higher bleomycin sensitivity; therefore, the designated risk genotype of XRCC1 is Arg/Gln, Gln/Gln, and a score of 1 was assigned in the pathway of reduced DNA repair (18, 19).
The genes assigned to the same pathway were assumed to be exchangeable, that is, for their effects to arise from a common distribution. The four pathways were not mutually exclusive. For example, MPO activates procarcinogens in tobacco smoke, such as benzo[a]pyrene, through the release of reactive oxygen species (2123). −463A allele is associated with reduced mRNA expression and its transcription activity is about 25 times lower than G allele in vitro (24). A score of 1 was assigned to MPO G/G, G/A in the pathway of enhanced activation and increased oxidative stress due to MPO's role in both pathways. Another example, NQO1 Pro187Ser variant genotypes (Pro/Ser, Ser/Ser) was given a score of 1 in both pathways of reduced carcinogen detoxification and enhanced procarcinogen activation, because NQO1 converts the highly genotoxic benzoquinine to the less toxic hydroxy metabolites (24), whereas it is also involved in bioactivation of some procarcinogens, such as 4nitroquinoline1oxide (25).
The basic Z matrix is reported in Table 1. In summary, the biological implication of this Z matrix is that GSTs, NAT2, SULT1A1, and NQO1 play roles in carcinogen detoxification, and their designated risk genotypes reduce the detoxification; NAT1, SULT1A1, CYP1B1, MPO, and NQO1 are involved in procarcinogen activation, and their designated risk genotypes enhance the activation; CYP1B1, MPO, COMT, MnSOD, and NQO1 modulate the oxidative stress, and the risk genotypes reduce the oxidative stress; XRCC1, XRCC3, and XPD repair DNA via different mechanisms, and their risk genotypes decreased the DNA repair capacity.
We conducted sensitivity analyses by altering the Z matrix, based on the function of genes with possible mechanisms that are still open for discussion. Two variations of the Z matrices are also shown in Table 1. In Zmatrix variant 1, we added COMT Val108Met and MnSOD Ala9Val in the pathway of enhanced procarcinogen activation, and removed CYP1B1 from the pathway of increased oxidative stress. These changes are based on function of genes with possible mechanisms that are still open for discussion. For example, COMT and MnSOD are known to protect the cell from oxidative stress (26, 27). Their genetic variants are designated as risk genotypes because they are known to have lower enzyme activities or predicted to be less effective (2830), and, therefore, contribute to increased oxidative stress (as in basic Z matrix). Because reactive oxygen species are often involved in the activation of the procarcinogens, genetic polymorphisms that increase oxidative stress might indirectly attribute to the enhanced activation of procarcinogens. We, therefore, added these two genetic polymorphisms into the pathway of enhanced activation in Zmatrix variant 1.
In Zmatrix variant 2, scores were allowed to range from −1 to 1. Genetic polymorphisms were assigned a score of −1 when hypothesizing that the prior risk genotypes have opposite effect in the given pathway (e.g., increased detoxification, instead of reduced detoxification). In other words, the factor was expected to have a negative coefficient. For example, NAT1 have been shown to be involved in both the activation and detoxification of aromatic amines. In general, NAT1 has a major role in the Oacetylation of Nhydroxy aromatic amines, and lead to the activation of aromatic amines (32). NAT1 *10 and *11 are associated with faster enzyme activity and were, therefore, assigned a score of 1 in the pathway of enhanced activation. On the other hand, because NAT1 is also involved in detoxification process, it is possible that a higher enzyme activity conferred by *10 and *11 might be associated with an increased detoxification via Nacetylation, although the affinity for which is less. We, therefore, assigned a score of −1 in the pathway of reduced detoxification to indicate the opposite effect via same pathway.
We assumed that the basic Z matrix was the most comprehensive prior for our purposes, and the analyses of geneenvironment interaction were based on the basic Z matrix. The details of Z matrix applied to geneenvironment interaction is described in the following section.
We used an EB approach, which estimated τ^{2} from the data. When this approach estimated τ^{2} as being less than or equal to zero, we used a SB approach setting τ^{2} equal to 0.05, a value similar to that estimated for main effects from the data in the EB approach. This assumes that the true odds ratio (OR) of each parameter would fall within a 2.4fold range, that is, . To see the how sensitive the results were to τ^{2}, we increased the prespecified value to 0.35, which assumes with 95% probability that the true ORs fall within a 10fold range [i.e., (ln(10)/3.92)^{2} ≈ 0.35].
Assessment of GeneEnvironment Interactions
Three environmental agents were included in the analysis: tobacco smoking and occupational exposure to polycyclic aromatic hydrocarbons and to aromatic amines. These exposures are known to cause of bladder cancer (32). Two approaches were used for assessing the geneenvironment interactions. The first approach added product terms to the firststage model (see Eq. F).
Attempting to include one environmental factor, with all the genetic factors and their product terms in a single model (a total of 29 parameters), resulted in the overparameterization of the model, which could not be fitted. Therefore, we grouped genetic polymorphisms based on the biological functions of the genes (four groups: GSTs, oxidative stress, DNA repair, and others). The basic Z matrix was then divided into four corresponding groups. One column was added to the Z matrix to indicate the environmental exposure. All geneenvironment product terms were assigned a score of 1 in the relevant genetic pathways and environmental exposure. The Z matrices used in geneenvironment interaction analyses are available from the authors on request.
The second approach used three indicators for subjects exposed to highrisk genotype only, environmental factors only, and to both factors (see Eq. G below). Because there were three indicators for each pair of geneenvironment interactions, analyzing the interaction by the group also resulted in an overparameterized model. Therefore, geneenvironment interactions were analyzed one pair at the time while using this approach. The Z matrix applied in this approach was also simplified to a threebytwo matrix with two columns indicating genetic and environmental factors. Consistent with the notation in Eq. G: D_{1} was assigned a score of 1 in genetic column, D_{2} was assigned a score of 1 in environmental column, and D_{3} was assigned a score of 1 in both columns.
Consistent with the analyses for the genetic main effects, in geneenvironment interaction analyses, we first used an EB approach to estimate τ^{2} from the data. When this approach estimated τ^{2} as being less than or equal to zero, we set τ^{2} equal to 0.05, a value similar to that estimated from the data in the EB approach. To see how sensitive the results were to τ^{2}, we increased the prespecified value to 0.35. All statistical analyses were conducted with SAS IML code in conjunction with GLIMMIX macro. An example program is available at URL http://darwin.cwru.edu/~witte/hm.html (17).
Results
Table 2 shows the results of the analyses of genetic main effects, comparing the conventional and the hierarchical analyses. In the conventional analysis, 5 of the 14 highrisk alleles included in the analysis were associated with a significantly increased risk of bladder cancer (one risk estimate was no longer significant in the analysis including all parameters at once). When assuming that all genetic factors are exchangeable (hierarchical model 1), all risk estimates were shrunk to a single mean. By prespecifying the prior secondstage covariates (hierarchical model 2), the estimates were shrunk to the mean of each pathway. Most of the time, the CIs were narrower from hierarchical modeling compared with the conventional estimates, that is, the precision of the risk estimates was increased. This feature was observed across all the results. Extreme but unstable values experienced the greatest shrinkage. For example, the OR of MPO G/G, G/A genotypes was reduced from 3.17 (95% CI, 1.327.59) to 1.64 (95% CI, 0.813.34). Note that using a hierarchical model does not always shrink the estimates toward the null; this depends on the Z matrix. For example, the OR for NQO1 Pro187Ser polymorphism increased from 1.33 (95% CI, 0.891.97, conventional analysis) to 1.41 (95% CI, 0.982.03, hierarchical model 2).
Slight alterations of the Z matrix did not have major influence on the risk estimates (results based on hierarchical models 3 and 4 in Table 2). When τ^{2} was prespecified, the larger the prespecified value, the lesser the “shrinkage” (results based on hierarchical models 5 and 6 in Table 2, and hierarchical models 1 and 2 in Tables 3 and 4).
Table 3 shows the results of geneenvironment interaction analyses based on the productterm approach. For focus and brevity, we do not present the entire set of 29 × 3 = 87 estimates, but highlight only the most illustrative findings. The interaction OR for smoking and NAT1 fast genotype was 2.74 (95% CI, 0.6811.0) from the conventional analysis, and 1.24 (95% CI, 0.801.93) from the hierarchical modeling. The point estimate of the product term of between GSTM1 null genotype and smoking did not change much: however, the CI narrowed considerably (from 0.435.14 to 0.743.83). Similar results were observed in the interaction analyses between genetic polymorphisms and exposure to aromatic amines. For example, in multiplicative terms, the hierarchical CI of the NAT2aromatic amines product term was almost 10fold narrower than that of the conventional analysis (i.e., 2.88/0.90 = 3.20 versus 10.1/0.34 = 29.7), indicating a dramatic increase in precision.
Table 4 shows the noteworthy results of geneenvironment interaction based on the indicatorterm approach. Similar to the results in Table 3, the hierarchical estimates were more stable, with the amount of shrinkage depending on the value of τ^{2}. For example, when looking at the interaction between the GSTT1 null genotype and exposure to aromatic amines, the conventional analysis resulted in an unstable OR that was likely overestimated and suggested the presence of interaction. The hierarchical modeling, on the other hand, gave more stable estimates and suggested no interaction.
Discussion
Genetic association studies with multiple genetic markers are susceptible to falsepositive associations. We showed that hierarchical modeling may improve the analyses of genetic epidemiologic data by increasing the precision of the risk estimates (e.g., as evidenced by narrower CIs), and reducing the likelihood of false positive via shrinkage toward the prior mean. For example, when looking at GSTM1 and smoking, the width of the CI, in multiplicative terms, was less than half of the width of conventional method (3.83/0.74 = 5.18, versus 5.14/0.43 = 11.95). This reduction in CI width is akin to increasing the number of cases in the present study 4fold (33). Moreover, hierarchical modeling gives estimates that are closer to the prior expectation. On the basis of the literature and our biological understanding of carcinogenesis, one would expect that common genetic polymorphisms confer a modest effect on cancer risk. For example, one would expect a common polymorphism as MPO G870A to confer a relative risk of less than 2 in the present study. The results from conventional analyses exceeded the prior, and the results from the hierarchical modeling were effectively adjusted toward the prior. As a consequence, hierarchical modeling reduces the likelihood of falsepositive results. Finally, this approach allows for residual effects that are not accounted for in the fixed first and secondstage coefficients, and avoids numerous independent singleinference testing (7). In our study, these benefits of hierarchical modeling were apparent for both the main effects and interaction analyses.
Potential falsepositive results are characterized by increased imprecision in effect estimation, which results in less weight in the hierarchical models, and are, therefore, more liable to the “shrinkage” effect of hierarchical modeling than true positive results (4). Although hierarchical modeling may also increase the potential for falsenegative findings, it should be noted that even initial positive findings that are later replicated in subsequent studies are also usually overestimated (34). Thus, the shrinkage effect typically seen in hierarchical modeling is also relevant for true positive effects.
Other authors had presented the use of hierarchical models for looking at genegene or geneenvironment interactions. Aragaki et al. (12) studied the interaction between diet and a single candidate gene NAT2, using a different data structure than the one presented here. We had similar data as in De Roos et al. (11), though their secondstage model assumed that all effects (i.e., main genetic, environmental, and when modeled, their interactions) should be shrunk toward a single prior mean. However, this may not hold in practice. Therefore, in the present report, we extended the previous work by using a secondstage (Z) design matrix that reflects the similarity of different genetic factors in a more detailed manner. Furthermore, in our interaction analysis, we provided results from both indicatorterm and productterm approaches. In the productterm approach, we went beyond looking at one interaction at a time. In addition, we undertook a sensitivity analysis of the design of the Z matrix and the prespecified τ^{2}.
There are a number of assumptions underlying the use of hierarchical models; when these are violated, the resulting estimates can be worse than obtained with a conventional analysis (35). While hierarchical modeling attempts to improve estimation accuracy, it may sometimes do so at the expense of increasing the bias (6). Recall that estimation accuracy reflects both validity (systematic error) and precision (random error). In the presence of systematic bias, and when the mean of the estimates is not reasonably close to the true mean, the hierarchical estimates would be pulled to the “wrong” mean. Furthermore, in the case when the secondstage model does not provide a reasonable presentation of the biological phenomenon, the estimates from the hierarchical model would also be biased. For example, if one assumes the effect of GSTM1 null (risk allele) and MPO A/A genotype (protective allele) come from the same mean, the estimate of GSTM1 null may be biased downward and the estimate of MPO A/A genotype may be biased upward (5). However, under a reasonably specified prior, the hierarchical models are expected to provide more precise estimates of risks.
Slight alterations of the Z matrix did not have a major influence on the risk estimates. This suggests that the model is relatively robust, instead of implying that one particular Z matrix is “correct.” One should aim to construct a Z matrix that reasonably reflects the underlying biological phenomena; however, it would not be feasible to estimate how well the Z matrix reflects “biological truth” by a statistical method, including sensitivity analyses.
Note that we used values of 0 or 1 in our secondstage design (Z) matrix. While such a crude design matrix has been shown to improve over conventional estimates (9), it is unlikely to fully capture the true differences between genetic factors. This is more likely to be on a continuous scale, reflecting aspects such as enzyme kinetics, the rate or the level of the adduct formation, etc. A study of NAT2 genotypespecific dietary effects on adenomatous polyps provided a fine example, in which Z matrix is the calculated conversion rates for heterocyclic amines, derived from the estimated concentrations of heterocyclic amines in the selected foods, and MichaelisMenten constant estimates for NATs genotypes (12). Our knowledge on the effects of the products of the genes included in our study is far from complete. As this information becomes available, one can further refine the Z matrix. In the context of genetic polymorphisms, the main obstacle for researchers to apply hierarchical modeling may be the construction of prior (Z matrix). Our results suggest that the model is relatively robust, and that a Z matrix which reasonably represents the biological mechanisms can be helpful in improving the risk estimates.
One can estimate τ^{2} from the data (EB approach) when the information of the residual effects is unknown, because when using a SB approach, τ^{2} is prespecified and hence potentially subject to criticism. However, previous work has shown that SB generally often outperforms EB, which may estimate τ^{2} as equaling zero (2). Regardless, one should consider varying τ^{2} to see how sensitive results are to this value, as in our example. Simulation studies have shown that the sensitivity of SB estimates to the prespecified τ^{2} increases with increasing study size (2).
We used two different approaches to investigate geneenvironment interactions (i.e., product terms versus indicator variables). When considering more than one pair of interaction, the productterm approach requires fewer terms in one's model, and so allowed us to consider more markers simultaneously than the indicator variable approach. In particular, when looking at the interaction analysis based on the productterm approach, the number of models to test, including genetic main effects and geneenvironment interactions was reduced from 56 (considering polymorphisms of 14 genes and 3 environmental exposures) to 13 [1 for genetic main effect, and 12 (4 × 3) models for geneenvironment interactions]. However, the indicatorterm approach may more directly exhibit the joint effect of the genetic and environmental factors.
In summary, association studies are commonly using highthroughput genotyping techniques, which generate observations on several hundreds or thousands of genetic markers in large populations. One must be very cautious about multiple comparison and potential falsepositive associations when analyzing and interpreting such data. Hierarchical modeling can help address these issues, provide more plausible and stable estimates, and, therefore, represents an alternative to conventional methods for genetic association studies.
Acknowledgments
The authors thank Umberto Gelatti, Donatella Placidi, and Antonio Scotto di Carlo for their contribution to the bladder cancer casecontrol study in Brescia.
Footnotes

Grant support: This work was supported by the NCI R01 grant CA 09203901A2.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

Note: R. Hung worked on this study under the tenure of a Special Training Award from the International Agency for Research on Cancer.
 Accepted February 9, 2004.
 Received November 4, 2003.
 Revision received February 3, 2004.