
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
1 Department of Biostatistics and Epidemiology, Center for Clinical Epidemiology and Biostatistics and 2 Abramson Cancer Center, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania
Requests for reprints: Timothy R. Rebbeck, Department of Biostatistics and Epidemiology, University of Pennsylvania School of Medicine, 904 Blockley Hall, 423 Guardian Drive, Philadelphia, PA 19104-6021. Phone: 215-898-1793; Fax: 215-573-2265. E-mail: trebbeck{at}cceb.med.upenn.edu
| Abstract |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
The effect of population stratification in epidemiologic studies of disease association with a single candidate gene has been extensively studied (2-9). Although various approaches using unlinked markers have been proposed to deal with the effects of population stratification on association tests of candidate genes (10-15), epidemiologic studies are increasingly concerned with interactions between genetic and/or environmental factors. None of the prior literature explicitly addresses the pattern or extent of bias due to population stratification in association studies involving interactions between genes and environmental factors.
To evaluate bias due to population stratification, we employed two dichotomous variables to represent the genetic and/or environmental factors and used logistic regression models to fit multiplicative interactions between the two variables for a binary disease outcome in a hypothetical cohort of multiple ethnicities. We derived algebraic solutions for asymptotic biases in the maximum likelihood estimates of the interaction variables under corresponding models that ignored ethnicities and identified conditions for the biases to reach their maximum or minimum expected values. We also provided numerical examples of biases due to population stratification under a wide range of conditions that may be observed in epidemiologic studies.
| Materials and Methods |
|---|
|
|
|---|
j) = 1, 2,..., k. Assume associations between disease and V1 and V2 are modeled with logistic regression as
![]() | (A) |
is the conditional probability of the disease (Y = 1) given V1, V2, and ethnicity, and binomial random error is assumed. Without loss of generality, let
1 specifies the log odds of disease (i.e., logit function of the baseline disease risk) in the lowest-risk ethnicity E1, and 0 <
2 < ... <
k, where
2 specifies the log odds ratio (OR) of disease risks comparing ethnicity E2 versus E1. Similarly,
k specifies the log OR of disease risks comparing ethnicity Ek versus E1; ß1 and ß2 specify the main effects associated with the comparison of V1 and V2 relative to their reference categories, respectively; and ß3 specifies the multiplicative interaction between V1 and V2.
Population stratification may be present when the joint distributions of V1 and V2 are different across the ethnicities. Biases due to population stratification can be evaluated by omitting all ethnicity indicator variables from model Eq. A to fit the mis-specified model
![]() | (B) |
We define the asymptotic biases due to population stratification to be ß1* ß1, ß2* ß2 for the main effects of V1 and V2, respectively, and define ß3* ß3 as the asymptotic bias for the interaction effects between V1 and V2. Here, we do not deal with issues of variance or precision of estimates, assuming model coefficient estimates obtained from sufficiently large samples satisfy E(
i*) = ßi*, where i = 1, 2, or 3.
Two Ethnicities
We obtained numerical estimates of large sample biases due to population stratification by fitting logistic regression models to simulated data generated under a wide range of conditions using Stata 7.0 (College Station, TX). Baseline disease risk in the low-risk ethnicity was specified to be 1%, whereas baseline risk in the high-risk ethnicity specified to be 2% or 10%. To specifically assess genotype-genotype interactions, subsequent text assumes V1 = G1 and V2 = G2 to denote the study of interactions involving two candidate genes G1 and G2. For simplicity, we assumed that candidate gene Gi (i = 1, 2) had two alleles Ai and ai with frequency pi and 1 pi, respectively. Genotype aiai was coded as the reference category (i.e., Gi = 0), whereas genotypes AiAi and Aiai were coded as the comparison (i.e., "at-risk" genotype) category (i.e., Gi = 1). The at-risk genotype frequencies of Gi were specified in the range of 5% to 95%, assuming single-locus Hardy-Weinberg equilibrium, corresponding to pi ranging from 3% to 78%.
In some cases, G1 and G2 may not be independently distributed due to linkage disequilibrium between the two loci, denoted here as D. Because D between G1 and G2 is constrained by Dmax and Dmin, where Dmax = min[p1 x (1 p2),(1 p1) x p2] and Dmin = max[p1 x p2,(1 p1) x (1 p2)] (16), we specified the degree of linkage disequilibrium between the two genes by D = D' x Dmax or D' x Dmin, where D' took on the values of 0 (no linkage disequilibrium), 0.2 (small linkage disequilibrium), 0.5 (moderate linkage disequilibrium), and 0.9 (strong linkage disequilibrium). Main effects and interactions of genes were specified by assigning ßj = 0 (no effect, OR = 1), ßj = ±0.4 (small effect, OR = 1.49 or 0.67), ßj = ±0.8 (moderate effect, OR = 2.23 or 0.45), or ßj = ±1.6 (large effect, OR = 4.95 or 0.20). Using the expected frequency of each genotype-disease category under the specified conditions, a hypothetical cohort of 100,000 observations was generated. Disease status for each observation was determined by comparing its disease risk with the standard uniform random variable. 5000 case-control samples were randomly drawn from the hypothetical cohort. Each sample consisted of 95% of diseased individuals as cases and an equal number of nondiseased individuals as controls. Both correctly specified models and their corresponding mis-specified models were fitted to each sample to obtain point estimates of all relevant regression coefficients. The corresponding point estimates from the 5,000 samples were averaged to obtain large sample point estimates for biases due to population stratification on both main effects and interactions. Results are presented in Figs. 1, 2, and 3.
|
|
|
i
Uniform [0.01, 0.02] or
i
Uniform [0.01, 0.1] for i = 1,..., k. Similarly, we assumed that genotype frequencies were uniformly distributed and were consistent with single-locus Hardy-Weinberg Equilibrium. We considered the at-risk genotype frequency within ranges of 5% to 10%, 5% to 20%,..., up to 5% to 95%. Linkage disequilibrium ranged from 90% of minimum possible value (i.e., Dmin) to 90% of maximum possible value (i.e., Dmax). Under these assumptions, we generated 5,000 sets of variables for a hypothetical cohort. Each set of disease risks and genotype frequencies of the k ethnicities was randomly assigned from the distributions specified above, assuming k = 2, 5, or 10, respectively, and ß1 = ß2 = ß3 = 0 (i.e., OR = 1) or ß1 = ß2 = ß3 = 0.693 (i.e., OR = 2). Next, case-control samples were randomly drawn from the hypothetical cohort under each set of variables to obtain bias estimates as described in the previous paragraph. The range and average of the 5,000 sets of bias estimates were presented in Fig. 4.
|
| Results |
|---|
|
|
|---|
2 and
2, where
2 is the difference in log odds of the baseline risk of disease in the higher risk population compared with the lower risk population (see Eqs. A and B and Appendix 1). This maximum bias is reached only under extreme conditions, such as when the joint occurrence of V1 = 1 and V2 = 0 is never observed in the low-risk ethnicity, and the joint occurrence of V1 = 0 and V2 = 0 is never observed in the high-risk ethnicity.
Asymptotic biases to estimates of interactions between V1 and V2 are bounded by 2
2 and 2
2. That is, the maximal bias due to population stratification that can be attained for an interaction between two factors is bounded by twice the log OR of the disease risks in the two populations being compared. However, these maximal biases can be reached only when all of the following four conditions hold: (a) the joint occurrence of V1 = 1 and V2 = 0 is never observed in the high-risk ethnicity; (b) the joint occurrence of V1 = 0 and V2 = 1 is never observed in the high-risk ethnicity; (c) the joint occurrence of V1 = 1 and V2 = 1 is never observed in the low-risk ethnicity; (d) the joint occurrence of V1 = 0 and V2 = 0 is never observed in the low-risk ethnicity. Similarly, the maximal bias in the other direction is 2
k only when the reverse of the four conditions described above hold (see Appendix 1).
Similarly, when k > 2 ethnicities in the cohort, the most extreme biases that could result from population stratification are ±
k and ±2
k to main effects and interaction estimates, respectively, where
k represents the maximum of log OR among baseline risks of the k ethnicities (see Eqs. A and B and Appendix 1). To our knowledge, this is the first demonstration of the bounds on the magnitude of biases to interaction estimates for genotype-genotype or genotype-environment interaction studies. However, the boundary conditions that result in the theoretical extremes are unlikely to represent the conditions observed in most studies. Therefore, we undertook numerical evaluations next to consider situations that are more likely to be encountered in actual studies.
Two Ethnicities
We have summarized results of large sample biases to main effects and genotype-genotype or genotype-environment interactions using six sets of conditions that may be encountered in association studies (Table 1). Note that bias can only arise when differences in baseline disease risk among ethnicities exist (1), which is a necessary condition for confounding to occur (17). Condition a represented the situation where no population stratification existed for G1 or G2, when G1 and G2 had the same marginal distributions, and linkage disequilibrium between G1 and G2 was the same in each ethnicity (i.e., joint distributions of G1 and G2 were the same across the ethnicities). Under these conditions, ignoring ethnicity did not result in large sample bias due to population stratification in any of these estimates when ß1 = ß2 = ß3 = 0. When ß1 = ß2 = 0 but ß3
0, ignoring ethnicity did not result in large sample bias to ß1 or ß2 but resulted in negligible biases toward the null hypothesis to the interaction term ß3. When ß1
0 or ß2
0, a slight bias towards the null hypothesis was observed, whereas biases to ß3 were no longer always towards the null. Instead, the bias to ß3 depended on both the main and interaction effects between the two genes. In all cases, the magnitude of these large sample biases was negligible and reflected nonlinearity of logistic regression model (18-20) rather than biases due to population stratification.
|
Under condition c, the marginal genotype distribution of G1 was constant across ethnicities, and G1 and G2 were in linkage equilibrium in both ethnicities (D = 0). When ß1 = 0 and ß3 = 0, there were no large sample biases to their estimates. As shown in Fig. 2A and B for "No Linkage Disequilibrium in Either Ethnicity," biases in G2 main effect estimates followed the same patterns as biases to a single candidate gene under conditions of population stratification (1, 6). Bias was positive if at-risk genotype frequency of G2 was greater in high-risk ethnicity (Fig. 2A). Bias was negative if at-risk genotype frequency of G2 was greater in low-risk ethnicity (Fig. 2B). Condition d differed from condition c only in that linkage disequilibrium was specified to be different across ethnicities. Under condition d, biases to main effects and interactions depended on the ethnicity-specific genotype frequencies of both genes and baseline disease risks as well as the main effects of the genes and their interactions. For example, we considered the situation when baseline disease risks by ethnicity were 10% versus 1%, main effects and interaction were 0, linkage disequilibrium was D' x Dmax in the high-risk ethnicity and D' x Dmin in the low-risk ethnicity. Figure 2A presents the results where the frequency of the at-risk genotype of G2 varied within the high-risk ethnicity, whereas Fig. 2B presented the results where the frequency of the at-risk genotype of G2 varied within the low-risk ethnicity. As shown in these figures for D' = 0.2 and D' = 0.5, large sample biases occurred to G1 main effects, G2 main effects, and interaction effects. These biases became more pronounced with increasing degrees of linkage disequilibrium. However, the biases did not follow a simple monotonic pattern with changing genotype frequencies.
Conditions e and f (Fig. 3) occurred when the marginal genotype distributions of both genes differed across ethnicities. If G1 and G2 were in linkage equilibrium (condition e), the patterns for the direction and magnitude of biases to main effects were similar to those reported for condition c, but biases to interaction effect estimates did not follow simple patterns corresponding to the marginal genotype frequencies of either gene. If G1 and G2 were in linkage disequilibrium (condition f), biases to main effects no longer followed simple patterns. For example, in Fig. 3A, baseline disease risks by ethnicity were 10% versus 1%, and both main effects and interactions were assumed to be 0. Bias depended on at-risk genotype frequencies of both genes as well as on the degree of linkage disequilibrium between the two genes. Large biases were observed even when genotype frequencies were not very different across ethnicities. Again, no simple relationship of bias was observed with respect to genotype frequency. Therefore, biases due to population stratification can be large in relatively unpredictable ways when marginal genotype frequencies of both genes differ by ethnicity and the two genes are in linkage disequilibrium.
More than Two Ethnicities
When we expanded our analyses to consider cohorts consisting of k = 5 or 10 ethnicities, we observed that (on average) large sample biases to main and interaction effects were either nonexistent or negligible (Fig. 4A-D), even if the conditions for population stratification (Table 1) were met. This result follows because we assumed that baseline disease risks and the joint genotype distributions of both genes were uncorrelated. Biases to both main effects and interaction were greatest when k = 2 (i.e., when there were only two ethnicities in the cohort) but decreased with increasing number of component ethnicities. Biases to both main effects and interaction were smaller when baseline disease risks of the k ethnicities were all within the range of 1% to 2% and larger when baseline risks of the k ethnicities were all within the range of 1% to 10%. Biases to main effects tended to increase as the differences in genotype frequencies across ethnicities increased. For example, in G1 main effects and G2 main effects in Fig. 4A-D, the range of biases for main effects increased as the range of at-risk genotypes of both genes across the k ethnicities increased from 5% to 10% up to 5% to 95%. Under the latter more extreme conditions, biases to main effects approached their theoretical bounds.
For example, based on our algebraic derivation of bounds to the biases of the estimates, when baseline risks of k ethnicities were within 1% to 2% (Fig. 4A and B), biases to main effects approached their bounds of 0.7 and 0.7 (on the natural log scale). Although interaction estimates were bounded by 1.4 and 1.4, the actual biases observed were far from reaching these bounds. On the other hand, biases to interaction did not show monotonic relationships corresponding to increasing or decreasing ranges of marginal genotype frequency ranges for either gene. Even when the range of marginal genotype frequencies of both genes was small across ethnicities, biases to interaction estimates could still be very large. The patterns were similar when baseline risks of k ethnicities were within 1% to 10% (Fig. 4C and D), where biases to main effects were bounded by 2.4 to 2.4 and biases to interaction by 4.8 to 4.8. Similar patterns were observed when OR = 1 (Fig. 4A and C) and when OR = 2 (Fig. 4B and D) for both main effects and interactions.
| Discussion |
|---|
|
|
|---|
These arguments can be extended to studies of genotype-environment interactions, where correlation between the gene and environment factors exists. However, it is more likely that differences across ethnicities in the joint distributions of genetic and environmental factors result from different marginal distributions rather than from correlations between the genes and environments of interest. Although severe biases were only observed under extreme stratification conditions (e.g., large differences in linkage disequilibrium, in the marginal frequencies of two genes, and in disease risks across ethnicities), population stratification may result in larger biases in genotype-genotype interaction studies than in studies involving only one gene. Our analytic derivation showed that in most settings of interaction where the joint distributions of two factors differ across the levels of a third factor, and where disease risk also varies across those levels, the omission of that third factor as a covariate will produce biases in the estimates of interaction estimates that can be 2-fold as large as biases to the estimates of main effects. To our knowledge, this is the first time such quantitative evaluation has been addressed. We anticipate that our findings of population stratification effect on interaction studies will be useful for studies of both gene-gene interactions and gene-environment interactions (21, 22) in different populations.
Additional studies are required to address other aspects of bias due to population stratification in genotype-genotype interaction studies. For example, we have not considered the effect of deviations from Hardy-Weinberg equilibrium. Such deviations could also confer biases to interaction terms involving two or more genes. Nonetheless, the magnitude of biases would be bounded as shown in Appendix 1. Similarly, we have focused on large sample biases to point estimates from logistic regression in genotype-genotype interaction studies. Another future challenge is to assess the effect of population stratification on variance estimation and hypothesis testing. In the case of studies involving single candidate genes, Heiman et al. (8) addressed both issues by evaluating false-positive rates and comparing with confounding risk ratios due to population stratification. However, these evaluations have not been undertaken for genotype-genotype interactions. Marchini et al. (22) considered power issues due to population stratification, which have not been considered here. Finally, we have not evaluated additional situations of potential interest, such as the case of a main effect of a gene in one population but not in another.
When genotype distributions are the same across ethnicity or independent of ethnicity, ignoring ethnicity may still result in attenuation of the OR due to the nonlinearity of the logit link function in logistic regression (18-20). In the context of a single gene having same distributions in two or more ethnicities, bias is absent if the gene has no effect even when ethnicities are ignored. Otherwise, attenuation of the estimate towards the null will increase with the magnitude of the gene's effect. The magnitude of the biases would be negligible unless disease risks were also extremely different across ethnicities. We found that when interactions between two genes are considered, the same rules applied to biases in the main effects of the two genes. However, bias may occur for interaction estimates even absent interaction (i.e., ß3 = 0). In addition, the direction of the bias is not always predictable when interaction is present (i.e., ß3
0). Nonetheless, the magnitude of these biases to main effects or interactions was generally negligible, unless the ß1, ß2, and ß3 were all large (>1.6), and the relative disease risk between the two ethnicities was >10-fold.
We have evaluated biases under relatively extreme conditions of population stratification with respect to disease risk and allele frequencies. The ranges of variables employed here were similar or more extreme than those in other studies (10, 23). In real situations, biases on genotype-genotype interactions should be smaller when ethnicity strata are more numerous, and the range of disease risk is narrower than considered here. Furthermore, our results are consistent with those of Wacholder et al. (1) and Wang et al. (6) as to the smaller potential for bias in the presence of larger numbers of ethnicities. Similar arguments hold for gene-environment interaction studies. For studies of gene-gene interactions, linkage disequilibrium patterns differ by ethnicity (13). Therefore, studies of genotype-genotype interactions should specifically consider potential linkage disequilibrium, baseline disease risks, and genotype frequency differences by ethnicity. This issue takes on special significance in light of suggestions that population-specific linkage disequilibrium might contribute to nonreplication of association study results, including studies of genotype-genotype interactions (24).
The data presented here support the hypothesis that bias due to population stratification can occur in association studies involving genotype-genotype interactions, particularly if the two genes are in strong linkage disequilibrium. However, our results show that the magnitude of potential bias is constrained by the differences in disease risk among populations. Thus, when these disease risk differences among populations are small, population stratification cannot lead to large biases. Furthermore, our empirical results show that population stratification causes relatively small biases even under extreme conditions and is unlikely to cause large biases to estimates of main effects and interactions under usual study conditions, particularly when the correlation (i.e., linkage disequilibrium) among the interacting factors is small. Therefore, if population stratification is not a major concern, studies of interaction involving unlinked genes (e.g., genes in common metabolic pathways that are located on different chromosomes) might be appropriate for case-control association studies, whereas haplotype-based approaches might be more appropriate for genes in linkage disequilibrium.
| Appendix 1: Algebraic Analyses of Asymptotic Bounds on Biases Due to Population Stratification |
|---|
|
|
|---|
![]() |
![]() | (A) |
In the presence of population stratification, we assumed the following mis-specified model was fitted:
![]() | (B) |
Let fV1V2E2 represent the expected fraction of joint occurrence of V1 and V2 in ethnicity E1 (E2 = 0) or E2 (E2 = 1), where the subscripts V1, V2, and E2 each take values 0 or 1. For example, f000 represents the fraction of observations having joint reference categories of V1 and V2 in the low-risk ethnicity E1 (i.e., V1 = 0, V2 = 0, E2 = 0); likewise, f001 represents the fraction of observations having the joint reference categories of V1 and V2 in the high-risk ethnicity E2 (i.e., V1 = 0, V2 = 0, E2 = 1). Correspondingly, the expected values of D are
000 = P(
1) and
001 = P(
1 +
2) by Eq. A, where P(x) is the logistic function defined by P(x) = exp(x) / [1+ exp(x)]. Let
and 1
represent the proportions of the high-risk and low-risk ethnicity in the cohort, respectively, then the expected fraction of observations having the joint reference categories in the entire cohort is f00· = (1
) x f000 +
x f001, where the "·" subscript indicates that observations are pooled over the associated index ethnicity in this case. Let
00· be the estimated expected value of D for these observations under the mis-specified model (Eq. B), then
00· = P(
*), etc. Then the expected values of the maximum likelihood estimates of the variables in Eq. B are found by
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Because (1
) x (f000/f00·) +
x f001/f00· = 1 and P(x) is monotonic function,
1
*
1 +
2.
* =
1 only when f001 = 0 (i.e., when no observations in high-risk ethnicity fall into the joint reference categories of V1 and V2); and
* =
1 +
2 only when f000 = 0 (i.e., when no observations in low-risk ethnicity fall into the joint reference categories of V1 and V2).
Similarly, (1
) x (f100/f10·) +
x f101/f10· = 1; therefore,
1 + ß1
* + ß1*
1 + ß1 +
2, because
1
*
1 +
2 , then ß1
2
ß1*
ß1 +
2. Therefore, the biases to main effect estimates of V1 are bounded by
2
ß1* ß1
2. ß1* ß1 =
2 only when f100 = 0 and f001 = 0 [i.e., when no observations in the low-risk ethnicity fall into the comparison category of V1 (V1 = 1) and reference category of V2 (V2 = 0)]. In addition, no observations in the high-risk ethnicity fall into reference categories of V1 and V2; ß1* ß1 =
2 only when f101 = 0 and f000 = 0 (i.e., when no observations in the high-risk ethnicity fall into the comparison category of V1 and reference category of V2). In addition, no observations in the low-risk ethnicity fall into joint reference categories of V1 and V2.
In the same fashion, the biases to main effect estimates of V2 are bounded by
2
ß2* ß2
+
2. ß2* ß2 =
2 only when f010 = 0 and f001 = 0; ß2* ß2 =
2 only when f011 = 0 and f000 = 0.
Next, using above derivations, ß3 2
2
ß3*
ß3 + 2
2; thus, the biases to estimates of interaction between V1 and V2 are bounded by 2
2
ß3* ß3
+ 2
2. However, only when f100 = f010 = f001 = f111 = 0 will ß3* ß3 = 2
2; that is, all observations in the low-risk ethnicity fall into either both reference categories or both comparison categories of V1 and V2, and no observations in the high-risk ethnicity fall into either both reference categories or both comparison categories of V1 and V2; only when f110 = f000 = f011 = f101 = 0 will ß3* ß3 = 2
2.
The maximum bias is reached when the joint occurrence of V1 = 1 and V2 = 0 is never observed in the low-risk ethnicity, and the joint occurrence of V1 = 0 and V2 = 0 is never observed in the high-risk ethnicity. Similarly, the lower bound on the bias is reached only when the joint occurrence of V1 = 1 and V2 = 0 is never observed in the high-risk ethnicity, and the joint occurrence of V1 = 0 and V2 = 0 never occurs in the low-risk ethnicity.
The maximal biases for interactions can be reached only when all of the following four conditions hold: (a) the joint occurrence of V1 = 1 and V2 = 0 is never observed in the high-risk ethnicity; (b) the joint occurrence of V1 = 0 and V2 = 1 is never observed in the high-risk ethnicity; (c) the joint occurrence of V1 = 1 and V2 = 1 is never observed in the low-risk ethnicity; (d) the joint occurrence of V1 = 0 and V2 = 0 is never observed in the low-risk ethnicity. Similarly, the maximal bias in the other direction is 2
k only when all of the following four conditions hold: (a) the joint occurrence of V1 = 1 and V2 = 0 is never observed in the low-risk ethnicity; (b) the joint occurrence of V1 = 0 and V2 = 1 is never observed in the low-risk ethnicity; (c) the joint occurrence of V1 = 1 and V2 = 1 is never observed in the high-risk ethnicity; (d) the joint occurrence of V1 = 0 and V2 = 0 is never observed in the high-risk ethnicity.
B. Algebraic Analyses of Biases when k > 2 Ethnicities. Assume k ethnicities E1, E2,..., Ek comprise a cohort, with expected fractions
1,
2,...,
k, respectively, where
. Assume underlying disease associations can be fit with logistic regression written as logit (
) =
1 + ß1 x V1 + ß2 x V2 + ß3 x V1 x V2 +
2 x E2 +
3 x E3 + ... +
k x Ek (A). Maximum likelihood estimates for mis-specified model that ignored all k ethnicities E1, E2,...,Ek are
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
As an example for the notation, here, the expected fraction of observations in the joint reference category in the entire cohort is f00· =
1 x f000 +
2 x f001 + ... +
k x f00k, where the "·" subscript indicates that observations are pooled over the associated index ethnicity in this case.
Without loss of generality, assume baseline risks
001 <
002 < ... <
00k, so that 0 <
2 <
3 ... <
k. Then biases on intercept term satisfy
1
*
1 +
k,
* =
1 only when f002 = f003 = ... = f00k = 0 (i.e., ethnicities 2 to k do not have joint reference category of V1 = 0 and V2 = 0). In addition,
* =
1 +
k only when f001 = f002 = ... = f00(k1) = 0 [i.e., ethnicities 1 to (k 1) do not have joint reference category of V1 = 0 and V2 = 0]. Biases on V1 effect estimates satisfy ß1
k
ß1*
ß1 +
k, ß1* = ß1 +
k only when f002 = f003 = ... = f00k = 0 and f101 = f102 = ... = f10(k1) = 0 [i.e., ethnicities 2 to k do not have joint categories of V1 = 0 and V2 = 0, and ethnicities 1 to (k 1) do not have joint categories of V1 = 1 and V2 = 0]. In addition, ß1* = ß1
k only when f001 = f002 = ... = f00(k1) = 0 and f102 = f103 = ... = f10k = 0 [i.e., ethnicities 2 to k do not have joint categories of V1 = 1 and V2 = 0, and ethnicities 1 to (k 1) do not have joint categories of V1 = 0 and V2 = 0].
There will be no bias on V1 effect estimates (i.e., ß1* = ß1) if f001 = f002 = ... = f00(k1) = 0 and f101 = f102 = ... = f10(k1) = 0, or if f002 = f003 = ... = f00k = 0 and f001 = f002 = ... = f00(k1) = 0. Similarly, ß2
k
ß2*
ß2 +
k, ß2* = ß2 +
k only when f002 = f003 = ... = f00k = 0 and f011 = f012 = ... = f01(k1) = 0; and ß2* = ß2
k only when f001 = f002 = ... = f00(k1) = 0 and f012 = f013 = ... = f01k = 0. In addition, ß3 2
k
ß3*
ß3 + 2
k and ß3* = ß3 + 2
k only when f00i = f01i = f10i = f11i for i = 2,...,(k 1).
| Acknowledgments |
|---|
| Footnotes |
|---|
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Received 4/28/05; revised 10/18/05; accepted 11/ 9/05.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
L.-Y. Wang and W.-C. Lee Population Stratification Bias in the Case-Only Study for Gene-Environment Interactions Am. J. Epidemiol., July 15, 2008; 168(2): 197 - 201. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. S. Barnholtz-Sloan, B. McEvoy, M. D. Shriver, and T. R. Rebbeck Ancestry Estimation and Correction for Population Stratification in Molecular Epidemiologic Association Studies Cancer Epidemiol. Biomarkers Prev., March 1, 2008; 17(3): 471 - 477. [Full Text] [PDF] |
||||
![]() |
J. P. Ioannidis, P. Boffetta, J. Little, T. R O'Brien, A. G Uitterlinden, P. Vineis, D. J Balding, A. Chokkalingam, S. M Dolan, W D. Flanders, et al. Assessment of cumulative evidence on genetic associations: interim guidelines Int. J. Epidemiol., February 1, 2008; 37(1): 120 - 132. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. R. Rebbeck, A. B. Troxel, S. Norman, G. Bunin, A. DeMichele, R. Schinnar, J. A. Berlin, and B. L. Strom Pharmacogenetic Modulation of Combined Hormone Replacement Therapy by Progesterone-Metabolism Genotypes in Postmenopausal Breast Cancer Risk Am. J. Epidemiol., December 15, 2007; 166(12): 1392 - 1399. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. W. Knowles, T. L. Assimes, J. Li, T. Quertermous, and J. P. Cooke Genetic Susceptibility to Peripheral Arterial Disease: A Dark Corner in Vascular Biology Arterioscler. Thromb. Vasc. Biol., October 1, 2007; 27(10): 2068 - 2078. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||