
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Review |
Department of Genetics, Stanford University School of Medicine, Stanford, California 94305-5120, and Division of Research, Kaiser Permanente, Oakland, California 94611-5714
| Abstract |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
| Study Designs |
|---|
|
|
|---|
Several approaches for disentangling genetic from environmental influences are also possible in studies of human disease, although practical difficulties often limit their use. The most powerful design examines risks in biological relatives of affected versus control adoptees, because adoption creates a separation between an individuals biological and environmental influences. Because it is often difficult to obtain access to information on biological relatives of adoptees, adoption studies typically focus only on common disease or trait outcomes.
Another study design often used to separate genetic and environmental influences involves twins. Identical (MZ2 ) twins derive from the fission of a single fertilized egg and thus inherit identical genetic material. By contrast, fraternal (DZ) twins derive from two distinct fertilized eggs and thus have the same genetic relationship as full siblings, although they may be more "biologically" related because of sharing the same prenatal intrauterine experience.
Comparing the similarity of MZ twins with same-sex DZ twins is a common approach for gleaning the magnitude of genetic influence on a disease or trait and has been applied extensively to a broad range of disorders, including cancer. A standard measure of similarity used in twin studies is the concordance rate. The "pairwise" concordance is calculated simply as the proportion of twin pairs with both twins affected of all ascertained twin pairs with at least one affected. On the other hand, the "probandwise" concordance allows for double counting of doubly ascertained twin pairs and has the advantage of being interpretable as the recurrence risk in a co-twin of an affected individual (5) . Usually, the most critical assumption in twin studies is that MZ and DZ twins display a comparable degree of similarity because of the sharing of environmental factors, so that the difference in concordance rates between MZ and DZ twins is only a reflection of genetic factors.
| Genetic Models and the Interpretation of Family and Twin Studies |
|---|
|
|
|---|
(6)
, and specifically
S for the sibling risk ratio,
O for the offspring risk ratio,
P for the parent risk ratio,
1 for all first degree relatives combined (parents + siblings + offspring),
D for DZ twins, and
M for MZ twins.
If genetic susceptibility is attributable to a single (rare) dominant gene, it is easy to show (6)
that
P =
O =
S =
D = (
M +1)/2, which implies that the MZ:DZ ratio defined by RMD = (
M - 1)/(
D - 1) = 2. On the other hand, if susceptibility is attributable to a recessive gene,
P =
O <
S =
D, with the degree of difference between
S and
O depending on the frequency of the "at-risk" allele (
S/
O ranging from near 1 for a very common allele to infinity for a very rare allele). For a recessive model, RMD is usually >2, again depending on the allele frequency. For a rare allele, RMD = 4, but diminishes toward 2 if the allele is very common.
How are these expectations altered if there are also nongenetic cases mixed in, or if more than one locus contributes to susceptibility? Nongenetic cases (or "phenocopies") do not influence the predictions given above. On the other hand, if more than one gene exists that influences susceptibility, the predictions may be altered, depending on whether interaction effects exist among the contributing genes ("epistasis" in genetics parlance). Specifically, if mutant alleles at different loci are individually rare, so that it is very unlikely that an individual would carry more than one (a scenario typically termed "genetic heterogeneity" or "locus heterogeneity" by geneticists), the same predictions as given above hold. Equivalently, for more common alleles, if the risk associated with carrying multiple mutants is additive, the same predictions hold (6)
. By contrast, if the risk associated with carrying multiple "at-risk" alleles is not additive, e.g., multiplicative, a different pattern for the
values than described above occurs. Specifically, RMD is now >2 and can achieve very high values, depending on how many loci are involved and the degree of interaction.
Another genetic model that is commonly used in the analysis of family and twin data is the polygenic or MFT model. This model postulates a genetic basis consisting of numerous small, additive effects underlying a continuously distributed trait termed liability. The assumptions of the model allow invocation of a Gaussian distribution because of the Central Limit Theorem. It is further assumed that risk, as a function of liability, increases sigmoidally (asymptotically to 0 for liability equal to minus infinity and to 1 for liability equal to plus infinity). This sigmoid risk function is assumed to take the form of a cumulative normal distribution function. It can be shown that the latter assumption is mathematically equivalent to assuming an independent, additive random environmental component to liability, with a threshold imposed on the total liability scale determining affected status (e.g., a total liability value above a threshold T implies affected, and below T unaffected). Thus, according to the MFT model, there are two additive, normally distributed components to liability, a polygenic component and a random environmental component. The proportion of the total variance of liability attributable to the polygenic component is usually termed "heritability," where it is understood that this refers to the heritability of the latent liability trait. Because it is based on the underlying liability variable, heritability is independent of the threshold T.
What are the implications of the MFT model for familial relative risks? The MFT model has two parameters, the heritability H (defined above) and threshold T, determined by the population prevalence K. For example, a trait with a prevalence K of 1% corresponds to a threshold T of 2.33 (in SD units above the mean 0). Two relatives are assumed to have a bivariate normal joint distribution of liability with correlation
= rH, where r is the coefficient of relationship for the relatives (r = 1 for MZ twins, r = 0.5 for first-degree relatives, r = 0.25 for second-degree relatives, and so on). The recurrence risk is then calculated as the joint probability K2 that the liability values for both relatives exceed T, divided by K (the probability that the index relatives liability exceeds T). Then the familial risk ratio
is given by K2/K2.
There are two important implications of the MFT model for the
values. The first implication is that, for a fixed value of
, the corresponding heritability H decreases with decreasing K. For example, for two traits, each with a
S value of 2, one with a prevalence of 10% will have a higher heritability estimate than one with a prevalence of 1%. The second implication of the MFT model is that the MZ:DZ ratio RMD is always >2, indicating nonadditivity of gene effects. In fact, RMD increases directly with heritability H and inversely with prevalence K. At first, this feature may seem contradictory to the basic assumption of the MFT model, i.e., that the individual gene effects are additive. However, the additivity of gene effects is on the scale of liability. Because disease risk is not a linear function of liability but is sigmoidally related, on the risk scale the gene effects are nonadditive and thus give rise to interactive (or epistatic) effects. This is entirely analogous to the situation in epidemiology of defining whether interactions occur in the context of an additive or multiplicative model. In this case, in terms of recurrence risk patterns in relatives, it is nonadditivity on the risk scale (as opposed to liability scale) that matters.
These two characteristics of the MFT model are depicted in Table 1
. As is evident there, both
M and
D increase dramatically with decreasing K for a fixed value of H, as does RMD, especially for higher values of H. One can also see from this table that for a trait with
D = 2, the estimate of H ranges from 30% when K = 3% to 12% when K = 0.1%.
|
M and
D from genetic models are influenced by assumptions regarding environmental risk factors. The models above generally assume that environmental exposures are randomly distributed within families (or twins), thus inducing no additional familial or twin correlation. For exposures that do not cluster in families, the predictions given above hold true no matter what the relationship between genotype and exposure (i.e., in the presence or absence of gene-environment interactions), because exposures are independent among family members. On the other hand, for exposures that are not randomly distributed in families, there will be an impact on familial recurrence, i.e., to increase the
values given above. The degree of increase depends on the frequency of exposure and extent of correlation for the exposure among family members.
With regard to twin studies, we need to consider the relative impact on MZ versus DZ twins. If the environmental exposure is correlated to a similar degree between MZ and DZ twins (a common assumption), the MZ:DZ ratio RMD will actually be attenuated. This can be seen most simply by adding the same constant c to each of the twin risk ratios
M and
D. Then if RMD = (
M - 1)/(
D - 1) = 2 without the environmental correlation, with it RMD' = (
M - 1 + c)/(
D - 1 + c) = 2 -c/(
D -1 + c) < 2. On the other hand, if environmental exposure is correlated to a greater extent in MZ than DZ twin pairs, any result is possible, depending on the degree of difference. In particular, if the difference is large, RMD may increase (>2), whereas if it is modest, RMD may stay the same or decrease.
In general, the greatest opportunity for confounding arises for rare, powerful exposures that cluster strongly in families. Common or universal exposures are less likely to induce significant familial clustering.
| Evidence of Familiality |
|---|
|
|
|---|
1 defined above. These authors also examined family recurrence for a separate group of early-onset cancer probands (diagnosis prior to age 50 for melanoma, breast and brain/central nervous system cancers and prior to age 60 for all other cancers). A second large population based study of family recurrence in Sweden for a variety of cancer sites has been reported recently (8) . These authors studied cancer recurrence in >2 million nuclear families that were linked to the Swedish Cancer Registry. Specifically, they calculated SIRs for: (a) offspring with a parent but no sibling with cancer; (b) offspring with a sibling but no parent with cancer; and (c) offspring with both a parent and sibling with cancer. Among 4,225,232 parents, 435,000 (10.3%) had a diagnosis of cancer. The offspring were born after 1934 and followed up to 1996 and thus were between ages 0 and 61 at time of study. Among 5,520,756 offspring, 71,424 (1.3%) had a diagnosis of cancer. Among male offspring, the average age at diagnosis was 38; for female offspring, the mean age of diagnosis was 42.
Results of analyses of both data sets are reproduced in Table 2
(colon, rectum, and anus have been combined into one site, colorectal, giving 26 total sites). An important question is whether the rare cancer sites are less familial than the common sites. Thus, in Table 2
the various sites are listed in decreasing order of prevalence as reported in Utah. FRRs are given for all and early-onset probands as reported by Goldgar et al.
(7)
for Utah. For the Swedish data, all offspring with an affected parent (i.e., groups in a and c defined above) were combined to obtain a complete estimate of the offspring recurrence risk (SIR) and similarly groups in b and c were combined to obtain an overall sibling recurrence risk (SIR). Hence, the numbers provided in Table 2
differ somewhat from those provided in the original tables from the Swedish study (8)
but are more directly comparable with the figures from Utah.
|
(a) There is remarkable similarity of the FRRs across cancer sites, with a mean value (weighted by prevalence) of 2.12 and median of 2.15 for Utah (all probands) and a mean and median of 2.14 and 1.86, respectively, for the Swedish offspring. There are a few notable exceptions, however. Thyroid, testicular, and laryngeal cancers and lymphocytic leukemia and multiple myeloma appear to have elevated recurrence risks in both studies. Also in both studies, all FRRs are >1, and 18 of the 26 sites (Utah) and 14 of 17 sites (Sweden) have an FRR value between 1.5 and 3.0. Furthermore, there is overall consistency in the FRRs from Utah (all probands) and Sweden (offspring). For the 17 sites in common between the two studies, the correlation in FRRs is 0.83. However, this correlation is primarily attributable to the high values for thyroid, melanoma, and testis in both studies. After removing these 3 sites, the correlation becomes 0.01. This observation probably reflects a lack of true variation around the average FRR of 2 for the remaining sites, the observed variation being primarily random (i.e., statistical noise).
(b) It is apparent from Table 2
that there is no decline in FRR with decreasing frequency of the cancer site. In fact, if anything, there is a trend toward increasing FRR with decreasing frequency. For example, for Utah, for the first 13 cancers listed in Table 2
, the (weighted) average FRR is 2.08 (median, 2.04), whereas for the second 13 cancers, the (weighted) average FRR is 3.29 (median, 2.46). Similarly, in Sweden, for the first 9 sites listed, the average (median) FRR for offspring is 1.94 (1.86), whereas for the latter group of 9 sites, the average (median) is 3.39 (2.85). Thus, when characterized by FRR, rarer cancers are no less familial (and probably more familial) than the common cancers. They may appear "sporadic" because they are rare and most often occur in the absence of a family history (i.e., families with multiple cases are rare). However, when assessed systematically, relatives of cases with rare cancers have at least the same degree of increased risk (or more) compared with the relatives of cases with common cancers.
(c) Table 2
reveals increased family recurrence associated with early age at diagnosis. For Utah, for the nine cancer sites listed, the (weighted) average FRR for the early onset probands was 3.78 (median, 4.08). This figure is nearly 2-fold greater than the FRR for the same 9 sites for all probands (average, 2.08; median, 1.97). Eight of the nine sites listed showed an increase in FRR with early onset (only lung did not). Thus, it appears to be a generalizable conclusion that increased familiality is associated with early age of diagnosis.
In Sweden, the authors did not separate out their data based on age of onset of the proband. Rather, they calculated familial recurrence for offspring with an affected parent and for offspring with an affected sibling. As can be seen in Table 2
, the sibling recurrence ratios (mean, 3.37; median, 3.53) are systematically higher than the offspring recurrence ratios (mean, 2.14; median, 1.86). The authors interpreted the elevated sibling recurrence risk ratios as evidence of recessive gene action, because recessive genes lead to increased risk in siblings compared with offspring (8)
. However, this conclusion is completely confounded by the fact that for the offspring with affected siblings, the average age of diagnosis of the affected sibling was only 3842 on average, much younger than the average age of diagnosis of the affected parent for offspring with an affected parent (probably by 30 years or so). This was attributable to the fact that offspring were young at the time of study (maximum age, 61), whereas the parents were not. This is also reflected in the vastly lower prevalence among the offspring (siblings), 1.3%, compared with the parents, 10.3%. Thus, it is more likely that the elevated risks in siblings versus offspring of cancer probands as observed in the Swedish data are a reflection of increased familial risk being associated with early age at diagnosis, as also seen in Utah, rather than with recessive genes.
| Separating Genes from EnvironmentAdoptees and Twins |
|---|
|
|
|---|
With respect to twin studies, because most cancers are rare and occur late in life, large twin cohorts are usually required to obtain sufficient cases. Thus, few twin studies in cancer have been reported, and these have focused primarily only on the commonly occurring cancer sites or on all sites combined.
For example, the National Academy of Sciences Twin Cohort, containing nearly 16,000 male veteran twins, revealed no increased concordance in lung cancer mortality in MZ versus DZ twins (despite an observed increase in concordance for cigarette smoking in the MZ twins). This led the authors to conclude that genetic susceptibility has little influence on lung cancer mortality (10) . The same cohort was also studied for death from all cancer sites combined (11) . Here the ratio of MZ:DZ concordance was 1.4, modestly suggestive of genetic influence. This cohort was also evaluated for prostate cancer risk (12) . In this case, the MZ concordance was estimated to be 27.1% compared with 7.1% for DZ twins, giving a concordance ratio of 3.8, strong evidence for the influence of genetic susceptibility. The authors estimated the heritability of liability (described above) to be 57%.
Instead of large, population-based twin cohorts, an alternative strategy is to identify twins from a large sample of cancer cases and follow their co-twins for their cancer risk. This clinic-based approach was used to study Hodgkins lymphoma (13) , where 366 (179 MZ and 187 DZ) twins with the disease were identified, and their co-twins were followed. Ten of the 179 MZ co-twins similarly developed Hodgkins disease, compared with none of the 187 DZ co-twins, suggesting a strong heritable component to this form of cancer.
The Nordic countries are an ideal setting for population-based twin studies because of the existence of population-based twin and cancer registries. For example, two Swedish twin cohorts, one born between 1886 and 1925 with 10,503 pairs and another born between 1926 and 1958 with 12,883 pairs were linked to that nations cancer registry (14) . The authors found increased concordance in MZ versus DZ twins for colorectal, breast, cervical, and prostate cancers, suggesting the importance of genetic factors for these sites; by contrast, MZ and DZ concordance were comparable for stomach and lung cancers, suggesting less of a genetic role in these cancers.
Similarly, in Finland, 12,941 same-sex twin pairs were linked to that countrys cancer registry (15) . Examining all sites combined, these authors estimated a low overall influence of genetic factors (heritability of 18%) and thus concluded that the environment plays the major role in cancer susceptibility.
Most recently, the twin registry of Denmark was linked to that nations cancer registry and combined with similar analyses in Sweden and Finland to produce the largest population-based twin study of cancer to date (3) . In total, 44,788 same-sex twins were followed for cancer prevalence. Because of modest heritability estimates, the authors concluded that "inherited genetic factors make a minor contribution to susceptibility to most types of neoplasms." Because of the size of this study and the importance of this conclusion, we consider the data of Lichtenstein et al. (3) and interpretation of the data therein in greater detail below.
| Evidence from the Twin Data of Lichtenstein et al. (3) |
|---|
|
|
|---|
values. Thus, we calculate
individually for each site with a prevalence K [= (2c + d)/2n] of at least 1% but combine the remaining sites. For these remaining sites, we obtain a weighted average
value by calculating:
![]() |
M and
D with decreasing K, nor does the value of RMD decrease with K. Rather, there appear to be relatively constant values of
M and
D with K. To test this, I performed a goodness-of-fit test of a model with fixed values for
M and
D (6.14 and 3.35, respectively) for both men and women using a likelihood ratio goodness-of-fit
2 test for each site and also for all sites combined within four sex-zygosity groups: MZ male, DZ male, MZ female, and DZ female. The model fit poorly, because three zygosity-site combinations in males gave significant values (MZ larynx,
2 = 7.48; MZ prostate,
2 = 3.87; DZ lung,
2 = 8.38) and four combinations in females were significant (MZ stomach,
2 = 3.91; MZ colon,
2 = 6.01; MZ breast,
2 = 11.82; DZ breast,
2 = 5.94). Also, the overall fit was poor: MZ male,
2 = 3.56; DZ male,
2 = 5.06; MZ female,
2 = 2.10; DZ female,
2 = 2.58. Summing these last four gives
2 = 13.30 (with four degrees of freedom, P < 0.01).
|
M and
D than do the other sites. Testing the model of a constant value of
M and
D for all sites other than female breast gave an excellent fit to the data with
M = 7.61 and
D = 4.02 (results given in Table 4
2 = 6.54) and DZ lung (
2 = 4.40) in males were significant. Considering the total number of site-zygosity combinations (n = 100) tested, however, these results should not be considered formally significant. Furthermore, the overall fit to the four sex-zygosity groups was excellent, with the total
2 = 1.01 (four degrees of freedom, P = 0.90). Thus, a model of constant
M (7.61) and
D (4.02) for all sites in both sexes, but lower values for breast cancer in women (
M = 4.09,
D = 2.51), gave an excellent overall fit to the data.
|
M and
D are reasonably consistent across individual cancer sites and do not decrease with decreasing prevalence (K) of the cancer site. One can see, however, that in the context of the MFT model, heritability estimates would decrease with decreasing K because, as was indicated above, the estimate of H does decrease with K for constant
values. For example, as seen in Table 1
M = 6.1, for a cancer with prevalence K = 0.1%, H would be estimated at
20%, whereas for a cancer with prevalence K = 3%, H would be estimated at
50%. Similarly, for
D = 3.4, for a cancer with prevalence 0.1%, H is
23%, whereas for a cancer with prevalence 3%, H is 60%. Thus, the conclusion that rarer cancers are less heritable (3)
is strictly a consequence of the assumptions of the MFT model and is not robust to violations of that model. For example, if instead we measure gene effects directly in terms of
M and
D values, there is no such decrease with prevalence.
Another observation in Table 3
requires mention. The ratio RMD does not decrease systematically with decreasing K, indicating that by this measure also rarer cancers are not less heritable. Furthermore, the value of RMD for all cancer sites hovers
2.0 (2.05 for breast cancer and 2.19 for all other cancers). As shown in Table 1
, for the MFT model, RMD should range from
2.5 for a common cancer (K = 3%) to
4.0 (depending on H) for a rare cancer (K = 0.1%). Thus, the observed value of RMD conforms poorly to the predictions of the MFT model but extremely well to the single-locus or additive genetic model described above, which predicts RMD = 2. Thus, it is more likely that genetic susceptibility to cancer, in general, entails rare dominant genes and/or additive gene effects across contributing loci than genetic interactions.
It is also important to consider the consequences for genetic analysis based on the MFT model when the actual value of RMD is 2, whereas the MFT model predicts a higher value (e.g., RMD = 4). Application of the MFT model often allows for division of the environmental component of liability into a random component and a component attributable to shared twin environment S (3) , which is assumed to increase the correlation between MZ and DZ twins to the same extent (as opposed to the genetic component which increases the MZ correlation 2-fold compared with the DZ correlation). The inclusion of such a component thus leads to attenuation of RMD from the value expected for a polygenic model without S. It is therefore predictable that when the single-locus or additive model is correct and RMD = 2, application of the MFT model will lead to a positive estimate of S. This would be especially true for rare cancers, where the observed RMD (= 2) deviates more from the RMD expected for a pure polygenic model. Hence, the conclusion of a significant shared twin environmental component may simply be a consequence of using the wrong genetic model (i.e., MFT versus single-locus/additive), rather than indicating the actual existence of such a factor.
It is also of interest to compare the numbers in Tables 2
and 3
. For dominant gene effects, the FRR, as given in Table 2
, should correspond to
D of Table 3
. Because in Utah FRR was based on all first-degree relatives, including parents and offspring as well as siblings, FRR might, in theory, be less than
D if recessive genes are involved in cancer susceptibility. In fact, the average value of FRR estimated in Table 2
is
2.12, compared with 3.44.0 for
D observed in Table 3
. At first glance, this might suggest the presence of recessive genes. However, it is important to consider differences in age-structure and follow-up in the various studies.
As opposed to the results of Goldgar et al.
(7)
, which were based on age-adjusted lifetime rates, the prevalence figures given in Table 3
do not correspond to lifetime risks. This is because the cohorts of twins were only surveyed for cancer risk during a defined and limited period of time, i.e., they were both left-censored and right-censored (e.g., see Ref. 16
). Specifically, Swedish cohort I entered observation between ages 36 and 75 and was followed for 34 years (to ages 70109, or death); 4,490 subjects of 21,006 (21%) had a diagnosis of cancer. Similarly, the Danish cohort entered study between ages 13 and 73 and were followed for 50 years to ages 63123 (or death); 3,572 people of 16,922 (21%) had a cancer diagnosis. By contrast, Swedish cohort II entered observation at ages 1446 and were only followed for 22 years (to ages 3668); not surprisingly, only 1,157 cancer diagnoses were made in this group of 25,716 subjects studied (i.e., 4.5%). Similarly, the Finnish cohort entered at ages 1896 and were followed for only 20 years to ages 38116 or death. There were 1,584 cancer diagnoses of 25,882 subjects (6.1%). Swedish cohort II and the Finnish cohort represent more than half of all of the twin pairs (25,824 of 44,788; 58%). The large majority of cancers in these two twin cohorts have yet to occur.
The values of
M and
D given in Table 3
are likely to be strongly influenced by the age-structure of the sample. For example, it is known that familiality of many cancers (such as breast cancer) is higher at an earlier age of diagnosis, and thus
values decrease with age. The numbers provided in Table 3
correspond to cancers diagnosed primarily in midlife. Indeed, Table 2
also provides FRR values for cancers occurring in midlife (before age 50 or 60) for 10 cancer sites. The average value for these early-occurring cancers is 3.8, very close to the average value of
D given in Table 3
, as well as the sibling recurrence risk ratio from Sweden (Table 2)
. Thus, it appears most likely that the more modest values of FRR in Table 2
versus values of
D in Table 3
reflects the different age-structure and follow-up of the twin samples rather than the presence of recessive genes.
| Inherited Susceptibility: Site Specific or Generalized? |
|---|
|
|
|---|
M = 2.40. For male DZ pairs, 356 were concordant and 2,459 discordant, or
D = 1.95; also, RMD = 1.47. For female MZ pairs, 265 were concordant and 1,487 discordant, or
M = 2.2, whereas for female DZ pairs, 408 were concordant and 3,023 discordant giving
D = 1.70, and RMD = 1.7. These values of
and RMD are considerably attenuated from the corresponding numbers calculated from site-specific analyses. This observation indicates that the inherited predisposition to cancer is likely to involve many genes that are primarily (but not entirely) site specific. Goldgar et al.
(7)
also examined all pairs of cancer sites in their probands and affected first-degree relatives to ascertain possible genetic relatedness of susceptibility to different cancer sites. This involved consideration of 1,026 comparisons. Despite the large number of tests, many were deemed to be statistically significant. However, from a global perspective, considering all site pairs, the FRR was considerably reduced compared with "within site" estimates, consistent with the observations from the twin data. Also, the analysis of the Swedish family study considered across-site comparisons (8)
. Although many of these comparisons were statistically significant, they also generally found that the highest risk ratios were associated with site-specific recurrence. The conclusion of site specificity is also consistent with molecular results that to date have shown gene effects to be largely site specific (e.g., colon cancer and melanoma) and/or with very limited range (e.g., breast/ovarian cancer). | Heritability versus Attributable Risk |
|---|
|
|
|---|
M and
D as given in Table 3
For two values of
M (and hence
D) and allele frequencies ranging from 0.001 to 0.10, the values of PAF and RRHet have been calculated (Table 5)
. The
values correspond approximately to what was observed for breast cancer and all other cancers combined. Also, the values of f1 and f0 corresponding to K = 1% are given. Comparable values of f1 and f0 can be obtained for other values of K simply by multiplication. For example, for K = 0.1%, the values of f1 and f0 in Table 5
are divided by 10.
|
For common cancers (such as breast cancer in women), an allele frequency p = 0.001 is not consistent with the data because it would imply a heterozygote penetrance >1 (e.g., if the prevalence of breast cancer is 3.6%, the corresponding value of f1 would be 3.6 x 0.396 = 1.43). For breast cancer, studies of BRCA1 and BRCA2 alone already suggest a higher total allele frequency than p = 0.001 and a higher PAF but lower RRHet.
| Conclusions: Implications of Family and Twin Studies for Molecular Genetic Research |
|---|
|
|
|---|
Most of the cancer sites listed in Table 2
have a FRR close to 2, whereas most sites also have stable values for
M and
D and RMD as given in Table 3
, although for the rarer cancers reliable values of
M and
D are not possible. From Table 2
, a few sites have particularly low or high values of FRR. For example, uterine and pancreatic cancer and Hodgkins lymphoma all have FRR values less than 1.32. From the twin data (3)
, uterine cancer has a
M of 2.2 and
D of 4.7. Both of these values are greater than the FRR of 1.32 (Table 2)
, but the higher
D than
M is also not consistent with a role of genetic susceptibility. For pancreatic cancer, for males and females combined,
M = 11.0 and
D = 1.7. The low value of
D is comparable with the observed FRR value of 1.25, but the higher
M value is suggestive of genetic susceptibility, albeit perhaps multigenic. For Hodgkins lymphoma (FRR = 1.25), too few cases were observed in the large population-based twin study (3)
for meaningful analysis. However, a prior clinic-based study (13)
found 0 of 187 = 0% of DZ twins versus 10 of 179 = 5.6% of MZ twins to be concordant. The very high observed
M in that study is again suggestive of genetic susceptibility but possibly recessive and/or multigenic.
For the sites in Table 2
with high FRR values (thyroid, multiple myeloma, leukemia, larynx, and testis), most were too rare to obtain individual
M and
D estimates from the large twin study (3)
. However, combining across all five of these sites, weighted averages of
M = 17.1 and
D = 5.6 are obtained. These values are higher than the weighted average for all sites combined (even excluding breast) as given in Table 3
, suggesting that for these rare sites genetic influence may be more prominent, consistent with the FRR results in Table 2
. The higher value of RMD (=3.5) in this case, however, may indicate recessive and/or multiple interacting genes.
What are the implications of these results for molecular strategies to identify the genes underlying cancer susceptibility? To some extent, the answer depends on how many genes are involved in susceptibility to cancer of a specific site, their frequency, penetrance, and interactions. According to the analysis provided above, the data are generally most consistent with either rare dominant alleles or additive gene effects. For rare dominant alleles, the best approach is linkage analysis with multiplex pedigrees. Even if different mutations are involved in different families, there will still be adequate power in this approach provided the heterogeneity is mostly allelic (small number of loci) rather than nonallelic (large number of loci). Indeed, this approach proved successful for both breast cancer (17
, 18) and colon cancer (19
, 20)
, despite the fact that several distinct loci were involved. Furthermore, the observation in Table 2
that early age at diagnosis appears to be generally associated with increased familial risk argues for special priority given to families with young ages at diagnosis.
Although this strategy has worked for finding breast and colon cancer genes, to date it has been less successful in leading to clearly replicable linkage results for prostate cancer (21
, 22) . It is interesting to reexamine Table 3
in this regard. For breast and colon (sex-averaged) cancers, RMD is very close to 2.0. For prostate cancer, RMD was estimated at 3.86. A previous, comparably sized twin study of prostate cancer (9)
found an MZ concordance of 27.1% and DZ concordance of 7.1% versus a prevalence of 3.17%. These rates translate into values of
M = 8.55,
D = 2.24, and RMD = 6.09. The
M and
D values for both studies are quite similar, and the RMD value appears to be significantly >2.0. Thus, it may turn out for this cancer site that the genetic basis is not explained by independent, rare, autosomal dominant mutations but rather by recessive and/or multiple interacting loci. If such is the case, it would be more difficult to obtain a clear linkage signal than was true for breast and colon cancer.
Linkage analysis also requires the penetrance (probability for gene carriers to become affected) to be moderate to high, as otherwise extended multiplex families will not occur. Under these circumstances, nonallelic heterogeneity could become a more serious problem, because individual families will only provide modest LOD scores, and statistical significance would require lumping together many small families. In this case, a practical option is to study genetic isolates or founder populations, because these populations are likely to have considerably reduced allelic heterogeneity compared with outbred populations. Examples of such populations are Mennonites, Ashkenazi Jews, French Canadians, and Finns. Indeed, in the Ashkenazi Jewish population, only two common and one rare mutation occur at BRCA1 and BRCA2 loci, as compared with other more outbred populations that have much greater allelic diversity (23) . In addition, these founder mutations are typically of relatively recent origin and thus show linkage disequilibrium (allelic association) up to a substantial genetic distance along the chromosome (perhaps a megabase or more), aiding gene identification.
What if the susceptibility alleles are common and pan-ethnic? In this case, candidate gene studies using case-control methods are likely to be more fruitful (24) ; however, even in this scenario multiplex families are likely to provide greater power than singleton cases (25) .
What about cancers which have an identified, major environmental component such as lung cancer and cigarette smoking? According to Table 2
, lung cancer appears to be familial (FRR = 1.73.2), but the twin data provide nearly equal values of
M (6.27) and
D (6.14) in males. The latter would suggest a strong environmental effect shared by twins (i.e., smoking behavior) rather than a genetic component. Ironically, twin studies have consistently shown greater concordance for smoking behavior in MZ twins than DZ twins. This clearly is an example of an environmental exposure being confounded with genetic influence in a twin study paradigm. Yet, paradoxically, this concordance difference in smoking behavior is not reflected in a concordance difference for lung cancer. A comparable study of United States male twins (7)
found the same thing-greater concordance in smoking for MZ versus DZ twins, yet no difference in concordance for lung cancer. On the other hand, lung cancer in female twins (3)
, where the prevalence is much lower, does appear to follow a more genetic pattern,
M = 21.3 and
D = 1.76, although these figures are based on small numbers.
When a major environmental exposure is involved in cancer susceptibility, the question becomes: Are there specific genes that increase the risk of cancer in exposed individuals? In unexposed individuals? And are these genes the same? In theory, family and twin studies can address these questions. For example, if the genes are the same, then the risk of cancer should be increased in family members who are both exposed and unexposed, when the index subject is exposed or unexposed. Different genetic mechanisms would imply that only exposed family members of exposed probands are at increased risk. Although the numbers are small, the lung cancer twin data for females versus males is suggestive of more pronounced genetic influence on unexposed or less exposed individuals.
In conclusion, taken at face value, family and twin data support a comparable genetic component for susceptibility to most cancer sites, including the rarer or "sporadic" ones. However, family and twin studies have little power to disentangle interactions between unmeasured genes and environmental risk factors or to eliminate confounding between genes and environmental effects that are correlated in relatives. These limitations preclude the possibility of strong inferences about the genetic input to the various forms of cancer. However, studies of spouses of cancer cases show little in terms of increased risks for these unrelated but cohabiting individuals (26) , suggesting that environmental factors may indeed contribute little to familial aggregation for most cancers. Thus, the empirical data along with molecular genetic results for at least a few cancers (e.g., breast and colon) support the continued search for cancer susceptibility genes for all cancer sites, in addition to the environmental risk factors with which they interact.
| Appendix 1 |
|---|
|
|
|---|
M is given by 1 + w and
D by 1 + w/2, where w = 2pqa2/K2, and K, the population prevalence, is given by K = f0 + 2pa
(12)
. The heterozygote RR (RRHet) for genotype Ss
versus
ss is given by f1/f0 = 1 + a/f0 and the PAF for locus S is given by 2pa/(f0 + 2pa). Note that w = 2pqa2/(f0 + 2pa)2, so that q/(f0 + 2pa) = (w/2pq)1/2 and PAF = 2pa/(f0 + 2pa) = (2pw/q)1/2. Simple algebra shows that f0 = K(1 - PAF), so that RRHet = f1/f0 = 1 + a/[K(1 - PAF)] = [2pq + (q - p)(2pqw)1/2/[2pq - 2p(2pqw)1/2. Thus, although the values of f1 and f0 depend on K, the values of PAF and RRHet depend only on p and w (=
M - 1) and not on K. | Footnotes |
|---|
2 The abbreviations used are: MZ, monozygotic; DZ, dizygotic; MFT, multifactorial threshold; FRR, family risk ratio; SIR, standardized incidence ratio; PAF, population attributable fraction; RR, relative risk. ![]()
Received 2/ 1/00; revised 4/27/01; accepted 5/ 2/01.
| References |
|---|
|
|
|---|