
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Bethesda, Maryland 20892-7244
| Abstract |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
| Structure of diagnosis related selection bias |
|---|
|
|
|---|
This work is motivated by the planning of a case-control study of lung cancer. Our main interests lie in estimating the effects of genetic factors, such as alleles of a DNA repair gene, alone and jointly with smoking, on the risk of lung cancer; we are not particularly interested in estimating the effect of smoking on the risk of lung cancer. Controls for this study would be asked to respond to a questionnaire and provide blood and other biospecimens. Given the difficulties of choosing appropriate population-based controls with a high response rate, we considered using hospital controls instead, with the hope that confined patients would be more willing to participate. Before we could decide which conditions from which to choose hospital controls, we needed to determine the impact of choosing controls hospitalized for a disease related to smoking, the main risk factor for lung cancer but not the subject of this study, and possibly to the same genes that might be investigated for the study of lung cancer. However, we were unable to determine from the literature what the impact would be of choosing CVD2 controls, given that smoking affects the risk of CVD, and that a DNA repair gene suspected of being related to lung cancer might be truly related to CVD.
We, therefore, decided to address this question ourselves. For simplicity, we categorize smoking and genotype into 2 levels, E+ or E- for smokers and nonsmokers and G+ or G- for those who do or do not carry the allele of interest. For our main points, we present a simple numerical example first to demonstrate our point. To generalize, we show the results abstractly as well. We refer to a hospital control group, such as those diagnosed with CVD, as "improper," if it does not meet the stringent requirement that E and G not be related to the risk of being hospitalized with the control condition.
| Results and Examples |
|---|
|
|
|---|
and ß, respectively, and for the doubly exposed by
ß
. Formally,
and ß are the smoking and gene effects at the baseline level of the gene and smoking, respectively, and
is the multiplicative interaction parameter for lung cancer. The variables A, B and C for CVD are analogous to
, ß and
for lung cancer.
|
|
1 are artifacts of the arbitrary definitions of + and -. Table 3
|
1) when E is unrelated to the control disease (A = 1) and there is no multiplicative interaction for the control disease (C = 1). It follows that under these conditions the G-adjusted estimate is also unbiased because it is a weighted average of the unbiased estimates at the two levels of G (7)
. That is, if G and E are associated (in the controls or in the study base), then G confounds the crude estimate of the exposure effect when we choose a control group related to G, but the estimate of the effect of E can be deconfounded completely by adjusting for G (3)
. Of course, if there is a multiplicative G-E interaction, the weighted average will depend on
. It is noteworthy that if E and G are associated in the study base, one needs to adjust for G if it is a risk factor for either the case or the control diseases. For instance, one could consider using traffic-accident controls for a study of smoking and lung cancer under the assumption that alcohol, but not smoking, is an independent risk factor for accidents. Indeed, there will be more smokers among these controls than in a proper control group; standard adjustment in the analysis for those risk factors for accidents that are correlated with smoking completely eliminates the bias. As ever, unmeasured risk factors can induce bias. In particular, that an unmeasured factor X associated with exposure can confound the estimate of effect not only if X is a risk factor for the study disease but also if it is a risk factor for the control disease.
Table 4
illustrates the situation in which the adjusted estimates of effect of exposure are unbiased even though the crude effect would be biased when using improper controls.
|
Table 5
shows the situation in which the overall and the stratum-specific G and E effects are both distorted, and yet the multiplicative interaction term is estimated correctly. In Table 6
, the overall and stratum-specific G and E effects are both distorted, as is the multiplicative interaction. The additive interactions, which we define as the difference between the differences of the odds ratios for E in G+ and in G-, is also biased.
|
|
One unexpected consequence of the algebra of bias induction in the G-E interaction context is the effect of pooling patients with a mixture of diseases into a single control group. Even if there is no multiplicative interaction for each of two control diseases, there is likely to be bias when estimating multiplicative interaction when the two are pooled in a single control group. Let us assume that the odds ratios for disease 1 in (G+,E-), (G-,E+) and (G+,E+) relative to (G-,E-) are 2, 4, and 8, and the odds ratios for disease 2 are 4, 2, and 8, respectively. With a combined control group consisting of equal numbers from disease 1 and disease 2, the odds ratios for the disease of interest will be 3, 3, and 8, respectively, no longer following a multiplicative pattern and, thereby, causing a violation of the requirement of no multiplicative interaction for the control series because C does not equal 1; most often the magnitude of the bias will be minor except when the magnitudes of the effects are large. Any average of the interaction estimates using each control series separately is unbiased, if the estimate from each individual series is unbiased; polytomous (8) logistic regression uses a weighting method that produces an efficient and unbiased estimator.
The Effects of Selection Bias and Confounding on the Estimates of Joint Effect of Two Factors in Case-Control Studies.
Hospital controls are just one example of controls selected with potential bias. Our results extend not only to studies with hospital controls but to any situation in which cases or controls are selected in a biased manner with respect to a factor of interest. That is, Table 2
applies when the ratio in controls:cases of the odds of selection of a control with (E+,G-), (E-,G+), or (E-,G-) relative to (E-,-G) are A, B, and ABC, respectively, just as for the improper controls in Table 1
. In fact, these results apply to the possible distortion of a two-way interaction estimate attributable to the confounding effects of an unmeasured third factor that is differential in cases and controls, possibly caused by a three-way interaction.
| Discussion |
|---|
|
|
|---|
In considering the effects of selection bias, we have assumed the fulfillment of each of the standard case-control requirements, including those related to case and control ascertainment and selection, common catchment area, and equivalent exposure assessment. In addition, we assume, when appropriate, that special problems peculiar to hospital controls, such as Berksons bias, do not have an important impact (3) .
For concreteness, we discuss several important lessons from this work in the context of the use of hospital controls in studying the joint effects of G and E. First, there is no bias in the estimate of the multiplicative G-E interaction when there is no G-E interaction for the disease used for controls, even when the control condition is caused by either G or E. Similarly, even the effect of E, stratified on G, can be estimated without bias even when G causes the control diseases. The analogous statement holds when E and G are switched.
Second, bias can arise when assessing multiplicative interaction using two or more control diseases, even if there is no multiplicative interaction in each control disease. Even if each of two control conditions, perhaps CVD and accidents, individually lead to no bias, a pooled control group can produce bias. Thus, the protection provided by the use of multiple diseases instead of only one for estimating the effects of one factor does not extend completely to studying multiplicative interaction in the situation when both factors are related to disease; the use of polytomous logistic regression (8) might alleviate this problem.
Third, the additive interaction effect is less robust against bias introduced when E or G is related to the control condition. To measure additive interaction of G and E, when E is smoking, in a study of lung cancer, one must use only those diseases that G neither causes nor prevents among either smokers or nonsmokers. For example, using bladder cancer controls can produce bias in assessing a gene-smoking additive interaction when studying lung cancer, even if G is unrelated to the control diseases. On the other hand, there would be no bias in assessing multiplicative interaction if there was a G effect on the risk of bladder cancer but no multiplicative interaction with smoking.
In hospital-based case-control studies, a strategy of using several control conditions, each of which is thought to be causally unrelated to any factor to be studied, seems advisable. If new evidence suggests that one of the control conditions is related to one of the factors of interest, it is an easy exercise to exclude those controls from analyses aimed at estimating the effect of an individual factor or interaction involving that condition. We cannot rely on our knowledge completely; e.g., it might seem reasonable to use CVD controls for a study of Alzheimers disease until one realizes that variants in the APO-E gene causes both (9 , 10) . Second, a sensitivity analysis, examining the effect of excluding each control set, also seems appropriate. An objective method could be devised to determine which disease groups to include for each hypothesis, with decisions-to-exclude based on how much of an outlier the exposure distribution is for the one group among all of the others. Third, sometimes the cases can be used in a case-only analysis to provide additional, although not fully independent, evidence.
How do hospital-based studies compare with available alternatives? Family-based designs are particularly useful for identifying and characterizing genetic effects but seldom are ideal for studying joint effects. Studies using siblings as controls are often infeasible for studies of diseases of old-age and often suffer from overmatching on genetic and environmental effects that aggregate within families, but they are very efficient for studies of interactions between rare alleles and environmental factors (11) . Population-based case-control studies can be ideal in theory but can suffer from poor response rates and attendant biases. Case-only designs prohibit the estimation of the individual (12) effects of E or G or of additive interactions (1) without outside information. They permit a powerful test of multiplicative interaction between an environmental and a genetic factor but only under the assumption, impossible to verify directly, that the factors are independent in the study base (1 , 13) .
As explorations of gene-environment effect continue, investigators will develop new designs and modify old ones to increase efficiency. Subtle opportunities for bias can arise, as illustrated by the potentially biased estimation of the effects of genes among those exposed to the environmental variable, depending on the precise choice of eligible diagnoses. Nevertheless, the fundamental logic of hospital-based case-control design and the attendant control selection requirements hold in the setting of research on gene-environment interactions. The serious problems with hospital controls, particularly for additive interactions, must be considered against a background of other problematic control selection strategies. Alternatives, including sibling controls, population controls, and case-only designs, face their own challenges to validity or efficiency, including poor response rates, overmatching, and important assumptions of independence that are difficult to verify.
| Footnotes |
|---|
1 To whom requests for reprints should be addressed, at Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, EPS 8046, 6120 Executive Boulevard, Bethesda, MD 20892-7244. ![]()
2 The abbreviation used is: CVD, cardiovascular disease. ![]()
Received 7/27/01; revised 5/22/02; accepted 5/31/02.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
D. Il'yasova, B. McCarthy, J. Marcello, J. M. Schildkraut, P. G. Moorman, B. Krishnamachari, F. Ali-Osman, D. D. Bigner, and F. Davis Association between Glioma and History of Allergies, Asthma, and Eczema: A Case-Control Study with Three Groups of Controls Cancer Epidemiol. Biomarkers Prev., April 1, 2009; 18(4): 1232 - 1238. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Little, J. P.T. Higgins, J. P.A. Ioannidis, D. Moher, F. Gagnon, E. von Elm, M. J. Khoury, B. Cohen, G. Davey-Smith, J. Grimshaw, et al. STrengthening the REporting of Genetic Association Studies (STREGA): An Extension of the STROBE Statement Ann Intern Med, February 3, 2009; 150(3): 206 - 215. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Karami, P. Boffetta, N. Rothman, R. J. Hung, T. Stewart, D. Zaridze, M. Navritalova, D. Mates, V. Janout, H. Kollarova, et al. Renal cell carcinoma, occupational pesticide exposure and modification by glutathione S-transferase polymorphisms Carcinogenesis, August 1, 2008; 29(8): 1567 - 1571. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Negri, R. Talamini, M. Montella, L. Dal Maso, A. Crispo, M. Spina, C. La Vecchia, and S. Franceschi Family History of Hemolymphopoietic and Other Cancers and Risk of Non-Hodgkin's Lymphoma. Cancer Epidemiol. Biomarkers Prev., February 1, 2006; 15(2): 245 - 250. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Herrington Eliminating the Improbable: Sherlock Holmes and Standards of Evidence in the Genomic Age Circulation, October 4, 2005; 112(14): 2081 - 2084. [Full Text] [PDF] |
||||
![]() |
L. M. Morimoto, E. White, and P. A. Newcomb Selection Bias in the Assessment of Gene-Environment Interaction in Case-Control Studies Am. J. Epidemiol., August 1, 2003; 158(3): 259 - 263. [Abstract] [Full Text] [PDF] |
||||
![]() |
K.R. MERIKANGAS Implications of Genomics for Public Health: The Role of Genetic Epidemiology Cold Spring Harb Symp Quant Biol, January 1, 2003; 68(0): 359 - 364. [Abstract] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Cancer Research | Clinical Cancer Research |
| Cancer Epidemiology Biomarkers & Prevention | Molecular Cancer Therapeutics |
| Molecular Cancer Research | Cancer Prevention Research |
| Cancer Prevention Journals Portal | Cancer Reviews Online |
| Annual Meeting Education Book | Meeting Abstracts Online |