Abstract
Observational epidemiologic studies are prone to confounding, measurement error, and reverse causation, undermining robust causal inference. Mendelian randomization (MR) uses genetic variants to proxy modifiable exposures to generate more reliable estimates of the causal effects of these exposures on diseases and their outcomes. MR has seen widespread adoption within cardio-metabolic epidemiology, but also holds much promise for identifying possible interventions for cancer prevention and treatment. However, some methodologic challenges in the implementation of MR are particularly pertinent when applying this method to cancer etiology and prognosis, including reverse causation arising from disease latency and selection bias in studies of cancer progression. These issues must be carefully considered to ensure appropriate design, analysis, and interpretation of such studies. In this review, we provide an overview of the key principles and assumptions of MR, focusing on applications of this method to the study of cancer etiology and prognosis. We summarize recent studies in the cancer literature that have adopted a MR framework to highlight strengths of this approach compared with conventional epidemiological studies. Finally, limitations of MR and recent methodologic developments to address them are discussed, along with the translational opportunities they present to inform public health and clinical interventions in cancer. Cancer Epidemiol Biomarkers Prev; 27(9); 995–1010. ©2018 AACR.
Introduction
Obtaining reliable evidence of causal relationships from observational epidemiologic studies remains a pervasive challenge (1–3). While observational studies have made fundamental contributions to understanding the primary environmental causes of various cancers (e.g., smoking and lung cancer, hepatitis B and liver cancer, asbestos and mesothelioma; refs. 4–6), recent decades have seen numerous instances of apparently robust observational associations being subsequently contradicted by large chemoprevention trials (7–15). Notable translational failures include the ineffectiveness of beta-carotene supplementation to prevent lung cancer among smokers in the Alpha-Tocopherol, Beta-Carotene Cancer Prevention Study and vitamin E supplementation to prevent prostate cancer in the Selenium and Vitamin E Cancer Prevention Trial (SELECT). Contrary to expectations from observational data, findings from both trials suggested that supplementation may increase rather than reduce the incidence of cancer (8, 16).
Part of the difficulty in translating observational findings into effective cancer prevention and treatment strategies lies in the susceptibility of conventional observational designs to various biases, such as residual confounding (due to unmeasured or imprecisely measured confounders) and reverse causation (17, 18). These biases frequently persist despite energetic statistical and methodologic efforts to address them (19–21), making it difficult for observational studies to reliably conclude that a risk factor is causal, and thus a potentially effective intervention target. This issue is likely further compounded by the modern epidemiologic pursuit of risk factors that confer increasingly modest effects on disease risk, which can contribute to a ubiquity of spurious findings in the literature (22–24).
Despite these challenges, observational studies remain crucial for informing cancer prevention and treatment policy given issues in translating basic science to human populations and because intervention trials are expensive, time-consuming, and often unfeasible in a primary prevention setting. The development of novel analytic tools that can help address some of the limitations of conventional observational studies therefore remains an important field of research. One such approach known as Mendelian randomization (MR), which uses genetic variants to proxy potentially modifiable exposures, has seen increased adoption within population health research and offers much promise to generate a more reliable evidence base for cancer prevention and treatment.
What is MR?
MR uses germline genetic variants as instruments (i.e., proxies) for exposures (e.g., environmental factors, biological traits, or druggable pathways) to examine the causal effects of these exposures on health outcomes (e.g., disease incidence or progression; refs. 25–31). The use of genetic variants as proxies exploits their random allocation at conception (Mendel's first law of inheritance) and the independent assortment of parental variants at meiosis (Mendel's second law of inheritance). These natural randomization processes mean that, at a population level, genetic variants that are associated with levels of a specific modifiable exposure will generally be independent of other traits and behavioral or lifestyle factors, although several caveats exist (see Table 1). Analyses using genetic variants as instruments to examine associations with outcomes have a number of advantages: (i) effect estimates should be less prone to the confounding that typically distorts conventional observational associations (32), (ii) because germline genetic variants are fixed at conception, they cannot be modified by subsequent factors, thus overcoming possible issues of reverse causation, and (iii) measurement error in genetic studies is often low as modern genotyping technologies provide relatively precise measurement of genetic variants, unlike the substantial (and at times differential) exposure measurement error that can accompany observational studies (e.g., due to self-report).
Limitations of MR and techniques available to address them
Comparison of MR to randomized controlled trials
Because of the random allocation of alleles at conception, it can be useful to compare the structure of a MR analysis to the design of a randomized trial, where individuals are randomly allocated at baseline to an intervention or control group (Fig. 1). Groups defined by genotype should be comparable in all respects (e.g., approximately equal distribution of potential confounding factors) except for the exposure of interest. It follows that any observed differences in outcomes between these genotypic groups can be attributed to differences in long-term exposure to the trait of interest. This latter point is an important distinction when interpreting results from a MR analysis as compared with a randomized controlled trial (RCT): MR will generally estimate the effect of life-long “allocation” to an exposure on an outcome, unless an exposure typically occurs only from a certain age—for example, alcohol consumption and smoking—and the genetic proxy affects metabolism of that exposure (33). If the effect of this exposure on an outcome is cumulative over time, a MR analysis may generate a larger effect estimate than that which would be obtained from a randomized trial examining an intervention over a limited duration of time. In addition, if the effect of an exposure on an outcome operates primarily or exclusively over a critical or sensitive period of the life course (e.g., early childhood), a MR analysis should be able to “capture” a causal effect of this exposure but will not be able to distinguish such period effects. In contrast, a randomized trial will have the flexibility to test certain interventions over restricted periods of follow-up and in individuals who may be within narrow age ranges. These distinctions are discussed in more detail in the “Cancer latency and reverse causation—benefits of MR” section of this review.
Schematic comparison of the structure of a randomized controlled trial (SELECT) and a Mendelian randomization analysis (PRACTICAL). In SELECT (left), individuals were randomly allocated to the intervention (200 μg daily selenium supplementation, which lead to a 114μg/L increase in blood selenium) or control group (placebo). In PRACTICAL (right), the additive effects of selenium-raising alleles at eleven SNPs, randomly allocated at conception, were scaled to mirror a 114 μg/L increase in blood selenium. If an RCT trial is adequately sized, randomization should ensure that intervention and control groups are comparable in all respects (e.g., distribution of potential confounding factors) except for the intervention being tested. In an intention-to-treat analysis, any observed differences in outcomes between intervention and control groups can then be attributed to the trial arm to which they were allocated. Likewise, in a MR analysis, groups defined by genotype should be comparable in all respects (e.g., distribution of both genetic and environmental confounding factors) except for their exposure to a trait of interest. Any observed differences in outcomes between groups defined by genotype can then be attributed to differences in lifelong exposure to the trait of interest under study.
More formally, MR is a form of instrumental variable (IV) analysis that relies on three key assumptions: the IV (here, one or more genetic variants) should (i) be reliably associated with the exposure of interest; (ii) not be associated with any confounding factor(s) that would otherwise distort the association between the exposure and outcome; and (iii) should not be independently associated with the outcome, except through the exposure of interest (known as the “exclusion restriction criterion”; Fig. 2A). If all assumptions are met, MR can provide an unbiased causal estimate of the effect of an exposure on disease or a health-related outcome. Violation of one or more of these assumptions means that instruments are invalid and, consequently, that findings from such an analysis may yield a biased effect estimate.
Illustration of MR methodology. A genetic variant (G) is used as a proxy for a modifiable exposure (E) to assess the association between E and an outcome of interest (O) without the issues of reverse causation, and confounding (U). MR methodology relies on three main assumptions, in that G must (i) be reliably associated with E; (ii) not be associated with U; and (iii) not be independently associated with O, except through E. This method is exemplified in the context of assessing the association of smoking and lung cancer, using the CHRNA5-A3-B4 SNP as a genetic instrument for heaviness of smoking.
Previous success of MR approaches and potential for cancer research
Over the past decade, MR has been increasingly adopted as an analytic approach within population health research, particularly the fields of metabolic and cardiovascular disease (CVD), where there are several notable examples of important causal inferences. For example, MR has suggested a likely causal role of statins on type 2 diabetes (T2D) risk (34, 35); likely noncausal roles of circulating levels of high-density lipoprotein cholesterol (HDL-C) in myocardial infarction (36) and C-reactive protein (CRP) in T2D (37); pointed to the efficacy of proprotein convertase subtilisin/kexin type 2 (PCSK9) inhibitors for CHD prevention prior to the publication of confirmatory long-term trial results (34, 38); and prioritized further examination of apolipoprotein B (39, 40), lipoprotein(a) (41), and IL6 (42) and deprioritized fibrinogen (43) and secretory phospholipase A(2)-IIA (44) as intervention targets for CVD. Although this approach has scope to test the effects of an increasing number of exposures relevant to cancer through the continued growth in large-scale genome-wide association study (GWAS) output, to date there remains a noticeable gap in the MR literature with regard to cancer compared to other outcomes (Supplementary Fig. S1).
Here, we provide an overview of some recent studies that have applied MR to cancer outcomes, highlighting both the potential strengths compared with conventional epidemiologic studies and the unique challenges of performing MR studies in cancer. Recent methodologic extensions to the original MR paradigm are presented, with emphasis on the translational opportunities that they may offer to inform drug target validation and public health strategies to reduce the burden of cancer.
Considerations for MR in cancer
Both the principal strengths of MR and important limitations of this method have been discussed in detail previously (25–31, 45–49). The latter are presented in Table 1 with some methodologic and statistical approaches that have been developed to address them outlined in Tables 2 and 3. Considerations which are specific to investigating causality in the setting of cancer are outlined below.
Summarized data and two-sample MR
Genetic risk scores and pleiotropy
Cancer latency and reverse causation–benefits of MR
Given long latency periods for many cancers, spurious findings resulting from reverse causation are an important concern in cancer epidemiology. Reverse causation has been suspected in several instances of ambiguous (74–76) or paradoxical findings (77) in the cancer literature. For example, early studies documenting an association between higher circulating cholesterol and lower cancer incidence were variably interpreted as plausible evidence of a protective effect of raised cholesterol on cancer risk or as latent cancer leading to a reduction in cholesterol levels (78–80). With the introduction and widespread usage of low-density lipoprotein cholesterol (LDL-C)–lowering medications for the prevention and treatment of CVD, concern arose that such measures could thus be increasing cancer rates (81, 82).
In an early proposal of the use of genetics as a tool to circumvent issues of reverse causation in observational data, Katan (83) suggested examining the association of genetic variants in APOE, determinants of circulating cholesterol levels, with cancer risk. As germline APOE genotype was fixed at conception, it was argued that it would not be influenced by subsequent cancer development and could therefore be used to establish whether cholesterol had a causal effect on cancer incidence. Subsequent MR analyses testing the effect of lifelong elevated cholesterol through genetic variation in APOE, NPC1L1, PCSK9, and ABCG8 have reported null associations with overall cancer risk (84–86). These findings alongside secondary analyses of statin trials showing no effect on cancer rates (87) suggest that, a potential explanatory role of confounding aside, early observational findings supporting a protective effect of cholesterol on cancer risk likely reflected undiagnosed cancer or early carcinogenic processes causing a reduction in cholesterol levels in prediagnostic samples.
Long-term exposure–benefits of MR
The advantages of exploiting the fixed nature of germline genotype extend beyond addressing reverse causation in observational studies. Large cancer prevention trials are often constrained to examining interventions over a limited duration in time and over a particular period in the life-course (e.g., middle and/or late adulthood; ref. 88). Given the length of time required for solid tumor development (89), randomized trials will often not allow sufficient follow-up for the effect of an intervention to be detected. In turn, long-term chemoprevention trials that are conducted may suffer from issues of noncompliance in the intervention arm, contamination in the control arm, and attrition during follow-up.
Furthermore, the optimal timing of an exposure to prevent cancer may be early in the life-course and therefore may not be adequately addressed in randomized trials (90). For example, it has been proposed that certain carcinogenic agents or processes may confer an effect, or a particularly pronounced effect, only over “critical periods” of early life or adolescence (e.g., the influence of inadequate childhood nutrient intake on adult cancer risk or the pubertal period as a window of breast cancer susceptibility; refs. 91–95). Interrogating the long-term effect on cancer of a given intervention in a prevention trial among children or adolescents would be unfeasible.
Examining the effect of genetic variants allocated at conception can therefore offer an important first step in identifying risk factors that may be sensitive to duration or timing of an exposure over the life course. Inferences made from promising MR findings to plausible intervention effects in a subsequent randomized trial would then need to carefully consider the possibility that effect estimates obtained in a MR analysis could be sensitive to critical period effects (in which case intervening on an exposure outside of this period may not alter disease risk) or represent the cumulative effect of lifelong exposure to a biomarker (in which case a relatively short-term trial may generate a smaller effect estimate than that obtained from MR). Adopting a “triangulation” framework where evidence from different epidemiologic approaches with nonoverlapping sources of bias are integrated can then be used to further examine durations of intervention necessary to confer an effect or “pinpoint” possible critical windows of susceptibility to carcinogenic agents (96). For example, multivariable regression analyses examining the association of an exposure, with some evidence of causality from MR studies, over different lengths of follow-up may help to identify the duration of exposure required to confer an effect. A negative control study with repeat measures of an exposure both within and outside of hypothesized critical periods (e.g., dietary fat intake before, during, and after pubertal development), in relation to subsequent disease risk (e.g., breast cancer; ref. 97) could be used to help refine periods of increased vulnerability to cancer-causing exposures.
Cancer latency and reverse causation–limitations of MR
Genetic variants known to directly affect an exposure will in some cases be well-characterized (e.g., variants in APOE), and it will be established whether or not the variant–exposure associations are influenced by the outcome of interest. The biological understanding of other variants associated with risk factors that are identified in GWAS, however, is often more limited. In some situations in which genetic variants are associated with both an exposure and outcome of interest, the association between a variant and outcome might be via the exposure (i.e., a valid IV analysis) but it is also possible that, under certain circumstances, there may be a primary effect of the variant on the outcome which in turn causes a change in the exposure.
This situation has been illustrated previously in the context of body mass index (BMI) and CRP where an erroneous causal effect can be generated if a genetic variant that primarily influences BMI, which in turn influences CRP levels because BMI has a causal effect on CRP, is mistaken as being a variant with a primary influence on CRP (25). Use of such a variant as an instrument for CRP in a MR analysis of the effect of CRP on BMI would then lead to biased results.
This introduction of reverse causation into a MR analysis may be problematic for common cancers with long latency periods between tumor initiation and diagnosis (e.g., breast and prostate; ref. 98). Reverse causation in this context could be mitigated by obtaining gene–exposure estimates in a healthy population where the prevalence of undiagnosed, latent cancer is likely to be low. These estimates could then be used to generate IV estimates in a two-sample MR framework. In addition, steps could be taken to construct an instrument solely consisting of genetic variants that plausibly act directly on a trait. For example, in constructing an instrument for CRP levels, this could include solely using variants within CRP itself as these variants are more likely to be exclusively associated with CRP levels than variants in other genes (99). However, it should be noted that a trade-off of using few, biologically informed SNPs as an instrument is that sensitivity analyses examining horizontal pleiotropy, when feasible to perform, will have limited statistical power.
Selection bias in cancer progression analyses
A particular concern in cancer epidemiology is that exposures that influence cancer incidence may not influence cancer progression or survival. For example, although smoking is a robust risk factor for breast cancer incidence, smoking cessation upon development of breast cancer seems to have little effect on subsequent survival (100). There has been some suggestion that folate may play a dual role in prostate and colorectal carcinogenesis: protective against DNA damage prior to the development of neoplasia, but promoting tumor progression via enhanced tumor proliferation and tissue invasion once cancer has developed (101, 102).
Some MR studies have begun to examine the effect of risk factors on both cancer incidence and progression (103). In a recent analysis examining the effect of alcohol on prostate cancer risk in 46,919 men in the PRACTICAL consortium, alcohol consumption was not associated with overall prostate cancer risk but increased risk of prostate cancer mortality among men with low-grade disease (104). Such MR studies exploit the fact that GWAS are being increasingly used to identify genetic variants associated with cancer progression or survival (105, 106).
However, there are important methodologic considerations in investigating factors causing cancer progression. This is because prognostic studies can suffer from selection bias due to the fact that any factors that cause disease incidence (or diagnosis) will tend to be correlated with each other in a sample of only cases, even when they are not correlated in the source population. Thus if at least one factor causes both incidence and disease survival (hypothetically, insulin resistance in Fig. 3), all the other factors which cause disease incidence (hypothetically, smoking in Fig. 3) will appear to be associated with survival, unless the true prognostic factor is conditioned upon. Thus, the estimated effect on progression for any factor that is associated with incidence is likely to be biased. However, any factor that is not associated with incidence will not suffer from selection bias by studying only cases in a MR analysis.
Directed acyclic graph for selection bias in prognostic studies. In this example, the square bracket indicates that we are conditioning on pancreatic cancer incidence in a survival study by only studying pancreatic cancer cases, thus inducing an association between smoking (a factor that is otherwise independent of pancreatic cancer survival) and pancreatic cancer survival. This link is broken when conditioning on the factor that influences both cancer incidence and survival (e.g., insulin resistance), which can otherwise be seen as a confounder of the association between smoking and cancer survival. If a factor appears to influence pancreatic cancer survival but is not associated with pancreatic cancer incidence (e.g., treatment for pancreatic cancer), selection bias in such an MR analysis would not be expected.
When conducting prognostic studies, care should be taken to examine and (where possible) overcome the selection bias due to studying only cases (103). First, the observed data could also be used to help identify plausible directed acyclic graphs (DAG) including both disease incidence and progression. For example, if a risk score for a phenotype, and an environmental variable, are correlated in cases, but not in the source population this would suggest that both factors influence disease incidence, diagnosis, or self-selection into the study. However, lack of evidence for such correlations does not imply that there is no selection bias, and expert or external knowledge should be used in constructing the DAG, as is usual practice. The DAG can then be used to help inform sensitivity analyses. Additional data on factors that predict incidence could be combined with observed data in cases, to minimize selection bias, either by conditioning or by inverse probability weighting. If more than one DAG are considered plausible a priori, then they can be used to conduct sensitivity analyses by examining how robust the conclusions are to the causal assumptions made. The DAG can also be used to identify which assumptions are being made that are untestable given the observed data, and then sensitivity analyses can be conducted by examining plausible values for those relationships.
Illustrative examples
To illustrate the use of MR in analyses examining cancer outcomes, we have outlined three studies that have employed this approach to understand the causal role of various exposures on cancer incidence.
Selenium and prostate cancer risk
Prospective studies reporting inverse associations of dietary, blood, and toenail selenium with risk of prostate cancer (107–113), along with findings from in vitro studies (114, 115), led to development of SELECT (9). SELECT was a 2 × 2 factorial trial of 35,533 healthy middle-aged men that examined the effect of daily supplementation with selenium, vitamin E, or both agents combined, as an intervention for prostate cancer prevention. The trial was stopped after 5.5 of a planned 12 years follow-up due to a lack of efficacy compounded by possible carcinogenic (increased rates of high-grade prostate cancer) and adverse metabolic (some evidence of increased rates of T2D) effects in the selenium supplementation group (8, 9). It is plausible that residual confounding may have accounted for conflicting results between prospective studies and SELECT (116, 117), although others have suggested that these differences may have reflected differences in baseline levels of selenium of participants in some observational studies as compared with SELECT (118).
To test whether a MR approach could have predicted the results of SELECT, a two-sample MR analysis (Table 2) was performed using summary data on 72,729 individuals from the PRACTICAL consortium (119, 120). Eleven SNPs robustly associated with blood selenium in previous GWAS (refs. 121, 122; P < 5 × 10−8) were combined into a genetic instrument (Table 3) to proxy circulating levels of selenium (Fig. 1). To allow for direct comparison of effect estimates with SELECT, the authors investigated the OR per 114 μg/L increase in circulating selenium, scaled to match the measured differences in blood selenium between supplementation and control arms in SELECT.
Consistent with results from SELECT, a 114 μg/L life-long increase in blood selenium in MR analyses was not associated with overall prostate cancer risk [OR:1.01; 95% confidence interval (CI): 0.89–1.13; P = 0.93; SELECT: HR:1.04; 95% CI: 0.91–1.19]. MR analysis of selenium on advanced prostate cancer (OR: 1.21; 95% CI: 0.98–1.49; P = 0.07) was concordant with weak evidence for an increased risk of high-grade prostate cancer in the selenium supplementation arm of SELECT (HR: 1.21; 95% CI: 0.97–1.52; P = 0.20). Likewise, the effect of selenium on T2D (OR: 1.18; 95% CI: 0.97–1.43; P = 0.11) was consistent with weak evidence for an increased risk of T2D in the selenium arm of SELECT (HR: 1.07; 95% CI: 0.97–1.18; P = 0.16).
A limitation of this analysis is that the authors did not test the hypothesis that the effect of selenium on prostate cancer risk varied by baseline selenium status. One way to investigate this in an MR framework would be to test for interaction in effect estimates by study location—whether the study was conducted in selenium replete (e.g., United States) versus selenium deficient (e.g., Europe) countries. If differences in baseline levels of selenium do impact on the effect of selenium on prostate cancer, we would expect different effect estimates in these different settings. The overall similarities in findings between this MR analysis and that of SELECT, as compared with results from conventional observational studies, thus provides some support for the utility of an MR approach in approximating experimental results using observational data. Furthermore, these results suggest that performing a MR analysis may be an important time-efficient and inexpensive step in predicting both efficacy and possible adverse effects of an intervention before an RCT is performed.
Alcohol and esophageal cancer risk
Regular alcohol consumption is associated with a substantial increased risk of esophageal squamous cell carcinoma in observational studies, with an approximate 2-fold increased risk for moderate drinkers and 5-fold increased risk for heavy drinkers when compared with occasional/nondrinkers (123). However, alcohol consumption is often associated with other lifestyle and behavioral factors (e.g., smoking and dietary intake), which may themselves predispose toward esophageal cancer (124, 125). Furthermore, most studies that examined this hypothesis have used case–control designs, which may introduce reporting bias if cases recall alcohol consumption differently from controls (123).
The ability to metabolize acetaldehyde, the principal metabolite of alcohol and a carcinogen (126), is encoded by ALDH2, which is polymorphic in some East Asian populations. Specifically, the ALDH2 *2 allele produces an inactive protein subunit that is unable to metabolize acetaldehyde, resulting in markedly higher peak blood alcohol levels in *2*2 homozygotes compared with *1*1 homozygotes (127). Individuals with the *2*2 genotype experience a flushing reaction to alcohol, along with dysphoria, nausea, and tachycardia, and therefore have very low levels of alcohol consumption (128). Consequently, genetic variation in ALDH2 is robustly associated with both acetaldehyde levels and alcohol consumption (via differences in physiologic response to levels of acetaldehyde). This satisfies the instrumental variable assumption that an instrument is robustly associated with an exposure of interest and ALDH2 can be utilized as an instrument for examining both regular alcohol consumption and blood acetaldehyde levels among alcohol consumers (129).
In a meta-analysis of seven studies with a total of 905 esophageal cancer cases of East Asian descent, individuals with the ALDH2 *2*2 genotype were found to have an approximately 3-fold reduced risk of esophageal cancer, as compared with the ALDH2 *1*1 genotype (OR: 0.36; 95% CI: 0.16–0.80), suggesting a protective effect of reduced alcohol on esophageal cancer (130). However, when comparing individuals with a heterozygous *1*2 genotype to *1*1 individuals, the former were shown to have a (seemingly paradoxical) overall increased esophageal cancer risk (OR: 3.19; 95% CI: 1.86–5.47). A naïve interpretation of this finding, without consideration of the effect of the ALDH2 *2 allele on blood acetaldehyde, would suggest that individuals with moderate alcohol intake had the highest risk of esophageal cancer.
When this association was stratified by self-reported alcohol intake, the effect of *1*2 genotype on esophageal cancer was shown to differ markedly by alcohol intake. Among nondrinkers, there was no strong evidence for an increase in risk among heterozygotes (OR: 1.31; 95% CI: 0.70–2.47) relative to *1*1 individuals. However, among heavy drinkers there was a 7-fold increase in risk (OR: 7.07; 95% CI: 3.67–13.6). Similarly, meta-regression analysis showed evidence that level of alcohol intake influenced the effect of the *1*2 genotype on esophageal cancer risk (P = 0.008; i.e., the larger the amount of alcohol intake, the greater the OR of *1*2 versus *1*1 genotypes). As the possession of an ALDH2 *2 allele only appeared to increase risk of esophageal cancer among heterozygotes who reported alcohol intake, this suggested that the substantially elevated acetaldehyde levels in these heterozygotes may mediate the effect of alcohol intake on esophageal cancer.
More generally, this example illustrates how interpretation of MR findings can be challenging when there is limited biological understanding of the genetic variant used as a proxy for a given exposure. MR results that appear to be strongly discordant with underlying biology should be followed-up alongside available functional understanding of genetic variants employed as instruments to help resolve ambiguous or paradoxical results and avoid naïve interpretation of findings.
BMI and lung cancer risk
In contrast to the relationship of adiposity with risk of most cancers, BMI has shown consistent inverse associations with incidence of lung cancer, particularly among current and former smokers (131, 132). As smoking is a robust risk factor for lung cancer and has an inverse effect on BMI (133), some have argued that residual confounding by smoking could account for this apparent protective association (134). Reverse causation (i.e., undiagnosed lung cancer or disease processes leading up to lung cancer prior to study entry influencing subsequent weight loss), especially in cohorts with insufficient follow-up time, has also been proposed as an explanation for this observational finding (135).
Attempts to address these possible sources of bias have failed to provide clarity. For example, studies that reported finely stratifying associations across various dimensions and classifications of smoking behavior (e.g., number of cigarettes smoked per day, “cigarette-years” smoked, and time since quitting smoking) have found little evidence to support residual confounding by smoking influencing this association (131, 132). Furthermore, studies removing individuals with inadequate follow-up have reported little effect on overall findings (131, 132, 136, 137), interpreted as suggesting that reverse causation is unlikely to be a major contributor to this association.
Given that germline genetic variants associated with BMI cannot be influenced by prevalent disease and should not be associated with potential confounding factors, an MR approach could be used to assess whether increased BMI is protective against lung cancer (138, 139). For example, Carreras-Torres and colleagues performed a MR analysis using GWAS results on 16,572 lung cancer cases and 21,480 controls of European descent (140). Ninety-seven SNPs previously associated with BMI in a GWAS of 339,224 individuals were compiled into an instrument to proxy for anthropometrically measured BMI. This instrument was associated with measured BMI but not with available measures of tobacco exposure, including pack-years, cigarettes smoked per day, or cotinine levels, providing some evidence against confounding through measured smoking variables (133). In two-sample MR analyses, a 1-SD increase in genetically predicted BMI was weakly associated with an increased risk of lung cancer (OR: 1.13; 95% CI: 0.98–1.30; P = 0.10), with strong heterogeneity across histologic subtypes (Pheterogeneity < 3 × 10−5). Notably, genetically predicted BMI was positively associated with risk of both squamous cell (OR: 1.45; 95% CI: 1.16–1.62; P = 1.2 × 10−3) and small-cell carcinoma (OR: 1.81; 95% CI: 1.14–2.88; P = 0.01) but showed weak evidence for a protective effect for adenocarcinoma (OR: 0.82; 95% CI: 0.66–1.01; P = 0.06). These findings thus help to clarify a likely positive risk relationship of BMI with two major histosubtypes of lung cancer. Alongside some genetic evidence to suggest that elevated BMI may influence subsequent smoking uptake (141), which itself reduces BMI while increasing lung cancer risk (133), these findings collectively suggest a possible mechanism that could help to reconcile seemingly conflicting MR and observational findings. Further interrogation of a possible mediating role of smoking on the causal pathway between BMI and lung cancer risk using “two-step MR” (discussed in "MR for mediation") may be able to help shed further light on the possible intricate relationship between smoking and BMI in the etiology of lung cancer.
Recent methodologic extensions and future applications
In recent years, the development of various methodologic extensions to the original MR paradigm have helped to enhance the scope of MR analyses, several of which are discussed below with reference to possible applications in cancer epidemiology.
MR for mediation
Over the past decade, high throughput “omics” technologies have begun to permit exhaustive profiling of the epigenome, metabolome, and proteome (as examples), allowing the collection of high-dimensional molecular data on increasingly large numbers of individuals (142). Such omics measures may serve as important mediators on causal pathways linking macro-level risk factors with cancer incidence or progression. While conventional mediation analyses exist to examine possible exposure–mediator–outcome relationships, the validity of these approaches relies upon strong assumptions which are unlikely to be met in practice, such as no measurement error and no unmeasured confounding (143).
With the performance of GWAS on large collections of metabolites and other omic measures (144, 145), this will create opportunity to develop instruments for these traits. To establish whether a particular molecular intermediate is on the causal pathway between an exposure and cancer, genetic variants can be used as instruments for both exposures and putative mediators that influence a disease outcome in a two-step MR framework (Fig. 4; ref. 146).
Two-step MR analysis examining the mediating effect of methylation on the association between smoke exposure and lung cancer. In the first step, a SNP within CHRNA5-A3-B4 is used as an instrument for smoke exposure to assess the causal association between smoking and DNA methylation. In the second step, an independent cis-SNP is used as an instrument for DNA methylation to assess the causal association of DNA methylation with lung cancer risk. The two-step method allows interrogation of the mediation effect of DNA methylation in the association between smoking and lung cancer risk.
For example, a method of testing the mediating role of methylation changes on cancer outcomes would be to exploit the fact that genetic variants (e.g., methylation quantitative trait loci, mQTLs) are robustly associated with methylation at CpG sites across the epigenome, providing possible instruments for MR analyses (147). Two-step MR could then be used to examine the potential mediating role of DNA methylation sites associated with exposures such as tobacco smoke (148), which have also been found to be strongly associated with lung cancer risk (149). To test whether methylation is causally mediating (some, or all of) the effect of tobacco exposure on lung cancer risk, in the first step, a SNP could be used to proxy smoking behavior to investigate its effect on the intermediate phenotype (DNA methylation). In the second step, an independent SNP could then be used to proxy the intermediate phenotype (DNA methylation), which could then be examined in relation to the disease outcome (lung cancer; ref. 143). This approach has the potential to be scaled up within the context of high dimensional ’omic datasets to integrate multiple tiers of molecular data in a causal framework (150, 151). While statistical and computational challenges arise with increasingly complex networks of molecular mediators, numerous data reduction and variable selection techniques may be used to identify informative causal molecular pathways to disease, including pathway analysis, penalized regression, machine learning, and data mining techniques, which are increasingly being applied in an automated fashion (refs. 152, 153; see the “Hypothesis-free MR” section of this review).
Factorial MR
Akin to a factorial RCT, factorial MR is a method of testing the independent and additive effects of two or more exposures on disease outcomes. This approach was adopted by Ference and colleagues, who performed a 2 × 2 factorial MR analysis to examine the effect of the LDL cholesterol-lowering drug ezetimibe on risk of coronary heart disease (CHD), as compared with the effect of statins alone or when combined with statins (154). Ference and colleagues examined the effect of genetically lower LDL-C on the risk of CHD through SNPs in NPC1L1 (a target of ezetimibe) alone, HMGCR (a target of statins) alone, or variants in both gene regions combined. The authors reported that natural randomization to lower LDL-C through SNPs in NPC1L1 and HMGCR alone showed similar decreases in LDL-C and CHD and that randomization to lower LDL-C in both groups combined had a linearly additive effect on LDL-C lowering and a log-linearly additive effect on CHD risk. These results were corroborated by the Improved Reduction of Outcomes: Vytorin Efficacy International Trial, which allocated 18,144 participants to ezetimibe, statins, both, or placebo (155).
An important caveat of this approach is that it relies on access to individual-level data and requires very large sample sizes to have adequate statistical power to reliably detect differences in effect across groups.
Hypothesis-free MR
A novel extension to a conventional “hypothesis-driven” MR analysis is a phenome-wide, “hypothesis-free” MR analysis (termed “MR-PhEWAS”; ref. 152). This approach makes use of genotyped datasets with high-dimensional phenotypic data or summary GWAS association statistics to perform hundreds or thousands of statistical tests simultaneously in an agnostic manner. For example, the approach can be used to examine the effect of a single exposure across multiple outcomes or multiple exposures across a single outcome. In contrast to hypothesis-driven analyses, hypothesis-free approaches allow for testing hypotheses that may not have been considered or tested previously, thus identifying novel risk relationships, and can help to address issues of publication bias as all analyses are openly specified and all results are presented (156).
For example, using a two-sample MR framework with summary data, Haycock and colleagues performed a MR-PheWAS examining the effect of telomere length on risk of 35 cancers and 48 noncancer diseases in 420,081 cases and 1,093,105 controls (157). After correction for multiple-testing, they found that telomere length increased cancer risk across most sites and histologic subtypes but reduced CVD risk. An important consideration when performing hypothesis-free MR analyses using summary data is the need to follow-up any putative findings in subsequent independent datasets. This can be a challenge when using summary GWAS data to perform such analyses if a large proportion of the available GWAS literature was used to provide causal estimates in the original “discovery phase” of an analysis.
MR for identifying causality of mutational signatures
Large-scale analysis of the genomes of thousands of patients with cancer has helped to reveal somatic “mutational signatures” (distinctive somatic mutational patterns left by unique carcinogenic agents) involved in the development of their tumors (158, 159). To date, mutational signatures have been identified across more than 30 different cancer types, with anywhere from two to six distinct mutational processes for each cancer type. Knowledge of the causes of somatic mutations within tumor tissue can improve understanding of the mechanisms by which endogenous and exogenous exposures promote the development of a cancer. Of the mutational signatures identified across cancer types, a putative cause has been proposed for approximately half (158); MR may offer particular promise in helping to identify the etiology of other mutational signatures identified (160).
Robles-Espinoza and colleagues examined the effect of germline MC1R status, associated with red hair, freckling, and sun sensitivity, on somatic mutation burden in melanoma. Such an analysis can be viewed as a MR appraisal of the effect of this sensitivity phenotype on somatic mutation burden in melanoma (161). For all six mutational types assessed, there was evidence of an increased burden of somatic single-nucleotide variants in individuals carrying one or two MC1R R alleles (disruptive variants). For one of the six mutational signatures characterized by an abundance of somatic C>T single nucleotide variants, each additional R allele at MC1R was associated with a 42% (95% CI:15–76%) increase in the C>T single-nucleotide variant count. This approach therefore highlights the possibility of testing the causal effect of suspected carcinogenic agents on mutational burden for various mutational signatures across cancer tissues and subtypes.
Drug repurposing and adverse drug effects
Drug repurposing, applying known drugs to novel indications, can provide a rapid, cost-effective mechanism for drug discovery and may hold promise for the development of pharmacologic interventions for cancer prevention (162, 163). In turn, for well-tolerated drugs that are considered candidates for repurposing, MR may offer an attractive approach for testing their potential chemopreventive efficacy. For example, it is currently possible to reliably instrument drugs for which there is a broad understanding of the biological mechanism of action (e.g., HMG Co-A reductase inhibitors, PCSK9 inhibitors, CETP inhibitors, and sPLA2 inhibitors in cardiovascular disease; ref. 164). For the primary or tertiary prevention of certain cancers, aspirin, metformin, and bisphosphonates have all been proposed as possible candidate pharmaceutical agents for repurposing (165–167). Using MR as a first step to test drug efficacy for novel cancer indications could help to prioritize or deprioritize which drugs should be taken forward to testing in RCTs for repurposing.
MR may also provide a useful approach for predicting adverse effects of pharmaceuticals (168). Preapproval trials are often not able to adequately capture development of adverse effects due to the comparatively small number of individuals typically exposed to a drug in such trials (unless drug effects are very common or very large), the limited duration of most trials, and unknown generalizability of trial participants to the broader population. While many of these issues can be addressed post-approval of a drug through spontaneous reporting systems, these introduce their own limitations including confounding, for example by indication, environmental factors, or lifestyle traits. MR studies should be able to overcome these limitations and have been employed in some instances to test or anticipate adverse effects of interventions in ongoing trials (e.g., adverse effects of statins on T2D as proxied by variants in HMGCR; refs. 34, 35, 169–171).
While knowledge of biological pathways can help to anticipate some adverse drug effects pre-approval of a drug, it may not be possible to correctly predict all such effects (172). One possible approach to resolve this would be to use MR-PhEWAS to perform a phenotypic scan of a genetically instrumented drug exposure across hundreds or thousands of potential outcomes, as outlined previously. The identification of possible adverse effects of a drug through this approach could then be used to prespecify and adequately power secondary outcome measures or, alternately, to deprioritize further investigation of a therapeutic target.
Conclusion
Observational epidemiologic studies are prone to various intractable biases that can undermine robust causal inference. MR offers a promising approach to generate a more reliable evidence-base for cancer prevention and treatment. The advent of MR methods using summarized data means that such analyses can now be performed more efficiently, rapidly, and with greater statistical power than previously possible. Furthermore, the range of methodologic extensions to the original MR paradigm now available have greatly expanded the scope of this approach, enabling increasingly sophisticated causal questions to be interrogated (173). Despite this, there are inherent constraints on the types of epidemiologic questions that can be answered with this approach as compared with conventional observational analyses. For example, MR is restricted to examining exposures that have a heritable component and suitable genetic proxies for these exposures; MR cannot isolate critical period effects for exposures; and MR will usually only represent the effect of lifelong exposure to a biomarker. These limitations mean that inferences made from MR will be most informative when integrated alongside insights gained from other epidemiologic approaches and study designs. Given optimism surrounding use of the method in helping to strengthen evidence for public health and pharmacologic interventions (174), it is likely that there will be a continued proliferation of MR analyses in the literature in the near future. Careful design, analysis, and interpretation of such studies with consideration of the limitations of the method will provide the greatest opportunity for such studies to inform cancer prevention and treatment strategies.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Acknowledgments
This work was supported by a Cancer Research UK programme grant (C18281/A19169), to K.H. Wade, R.C. Richmond, C.L. Relton, S.J. Lewis, and R.M. Martin, including Cancer Research UK Research PhD studentships (C18281/A20988) to J Yarmolinsky and R.J. Langdon. This work was also supported by a Wellcome Trust 4-year studentship (WT083431MA to C.J. Bull). All authors are members of the MRC IEU which is supported by the Medical Research Council and the University of Bristol (MC_UU_12013/1-9).
Footnotes
Note: Supplementary data for this article are available at Cancer Epidemiology, Biomarkers & Prevention Online (http://cebp.aacrjournals.org/).
- Received December 18, 2017.
- Revision received February 15, 2018.
- Accepted June 5, 2018.
- Published first June 25, 2018.
- ©2018 American Association for Cancer Research.
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.
- 51.
- 52.
- 53.
- 54.
- 55.
- 56.
- 57.
- 58.
- 59.
- 60.
- 61.
- 62.
- 63.
- 64.
- 65.
- 66.
- 67.
- 68.
- 69.
- 70.
- 71.
- 72.
- 73.
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.↵
- 97.↵
- 98.↵
- 99.↵
- 100.↵
- 101.↵
- 102.↵
- 103.↵
- 104.↵
- 105.↵
- 106.↵
- 107.↵
- 108.↵
- 109.↵
- 110.↵
- 111.↵
- 112.↵
- 113.↵
- 114.↵
- 115.↵
- 116.↵
- 117.↵
- 118.↵
- 119.↵
- 120.↵
- 121.↵
- 122.↵
- 123.↵
- 124.↵
- 125.↵
- 126.↵
- 127.↵
- 128.↵
- 129.↵
- 130.↵
- 131.↵
- 132.↵
- 133.↵
- 134.↵
- 135.↵
- 136.↵
- 137.↵
- 138.↵
- 139.↵
- 140.↵
- 141.↵
- 142.↵
- 143.↵
- 144.↵
- 145.↵
- 146.↵
- 147.↵
- 148.↵
- 149.↵
- 150.↵
- 151.↵
- 152.↵
- 153.↵
- 154.↵
- 155.↵
- 156.↵
- 157.↵
- 158.↵
- 159.↵
- 160.↵
- 161.↵
- 162.↵
- 163.↵
- 164.↵
- 165.↵
- 166.↵
- 167.↵
- 168.↵
- 169.↵
- 170.↵
- 171.↵
- 172.↵
- 173.↵
- 174.↵
- 175.
- 176.
- 177.