Background: Non-Hodgkin lymphoma (NHL) is a malignancy of lymphocytes, and there is growing evidence for a role of germline genetic variation in immune genes in NHL etiology.
Methods: To identify susceptibility immune genes, we conducted a 2-stage analysis of single-nucleotide polymorphisms (SNP) from 1,253 genes using the Immune and Inflammation Panel. In Stage 1, we genotyped 7,670 SNPs in 425 NHL cases and 465 controls, and in Stage 2 we genotyped the top 768 SNPs on an additional 584 cases and 768 controls. The association of individual SNPs with NHL risk from a log-additive model was assessed using the OR and 95% confidence intervals (CI).
Results: In the pooled analysis, only the TAP2 coding SNP rs241447 (minor allele frequency = 0.26; Thr655Ala) at 6p21.3 (OR = 1.34, 95% CI 1.17–1.53) achieved statistical significance after accounting for multiple testing (P = 3.1 × 10−5). The TAP2 SNP was strongly associated with follicular lymphoma (FL, OR = 1.82, 95%CI 1.46–2.26; p = 6.9 × 10−8), and was independent of other known loci (rs10484561 and rs2647012) from this region. The TAP2 SNP was also associated with diffuse large B-cell lymphoma (DLBCL, OR = 1.38, 95% CI 1.08–1.77; P = 0.011), but not chronic lymphocytic leukemia (OR = 1.08; 95% CI 0.88–1.32). Higher TAP2 expression was associated with the risk allele in both FL and DLBCL tumors.
Conclusion: Genetic variation in TAP2 was associated with NHL risk overall, and FL risk in particular, and this was independent of other established loci from 6p21.3.
Impact: Genetic variation in antigen presentation of HLA class I molecules may play a role in lymphomagenesis. Cancer Epidemiol Biomarkers Prev; 21(10); 1799–806. ©2012 AACR.
This article is featured in Highlights of This Issue, p. 1611
Non-Hodgkin lymphoma (NHL) is a group of heterogeneous malignancies of B and T lymphocytes, as well as other immune cells, although in western populations, B-cell malignancy predominates. Immune dysfunction is an established risk factor for NHL (1), and there is accumulating evidence from multiple independent candidate gene studies that genetic variation in genes involved in immune function and inflammation is associated with NHL risk (2–12). Genome-wide association studies (GWAS) have also identified several loci in and around the HLA region on chromosome 6p21.32–33 (13–16). We previously conducted and published an analysis of NHL risk (425 cases, 465 controls) using the ParAllele (now Affymetrix) Immune and Inflammation Panel, which included 1,253 genes that were tagged with 9,412 single-nucleotides polymorphisms (SNP; 17). Here, we report the results for a second-stage validation of the top 10% of SNPs from that analysis in a new set of 584 NHL cases and 768 controls, and then a pooled analysis on all 1,009 cases and 1,233 controls. We also formally assessed associations within the most common NHL subtypes: chronic lymphocytic leukemia/small lymphocytic lymphoma (CLL/SLL), follicular lymphoma (FL), and diffuse large B-cell lymphoma (DLBCL).
Materials and Methods
Study population and data collection
This study was reviewed and approved by the Human Subjects Institutional Review Board at the Mayo Clinic, and all participants provided written informed consent. Full details on this case-control study have been previously published (17, 18). Briefly, starting on September 1, 2002, we offered enrollment to all consecutive cases of newly diagnosed, pathologically confirmed lymphoma (including CLL) who were of age 20 years and older and a resident of Minnesota, Iowa, or Wisconsin at the time of diagnosis except cases with a history of HIV infection or who did not speak English. A Mayo Clinic hematopathologist reviewed all materials for each case to verify the diagnosis and to classify each case according to the World Health Organization Classification of Neoplastic Diseases of the Hematopoietic and Lymphoid Tissues (19). This analysis included all subjects enrolled into the study from September 1, 2002 through February 29, 2008. Of the 1,798 eligible patients identified, 1,236 (69%) participated, 183 (10%) refused, 39 (2%) were lost to follow-up (i.e., we were unable to contact after multiple attempts), and 340 (19%) did not complete all data collection within 12 months of diagnosis.
Clinic-based controls were recruited from Mayo Clinic Rochester patients under evaluation for a prescheduled medical examination in the general medicine divisions of the Departments of Medicine or Family Medicine from September 1, 2002 through February 29, 2008. Controls had to be at least 20 years old, a resident of Minnesota, Iowa, or Wisconsin at time of appointment, and no history of lymphoma or leukemia; controls with a history of HIV infection or who did not speak English were not eligible. Controls were frequency matched to the case distribution on 5-year age group, sex, and geographic location of residence using a computer program that randomly selects subjects from eligible patients. Of the 1,899 eligible subjects identified, 1,315 (69%) participated, 548 (29%) refused, and 36 (2%) did not complete data collection within 12 months of selection.
Participants completed a self-administered risk-factor questionnaire and provided a peripheral blood sample for serum and DNA studies. DNA was extracted from blood samples using a standard procedure (Gentra, Inc).
All participants who had an adequate DNA sample were genotyped as part of a larger genotyping project on a custom Illumina GoldenGate 1,536 SNP oligonucleotide pool (OPA). Individuals included in the original case-control series (17) were defined as Stage 1 (discovery set); the remainder of the participants were defined as Stage 2 (replication set). Cases diagnosed with Hodgkin lymphoma were excluded from this analysis. A total of 1,050 cases and 1,274 controls were randomly arranged on 96-well plates, with 50 samples plated in duplicate. One of the Centre d'Etude du Polymorphisme Humain family trios was included on every plate and was also duplicated across each of the plates. The inclusion of these 3 samples aided in genotyping concordance calculations as well as determination of non-Mendelian inheritance patterns. For duplicated samples, the sample with the higher call rate was used for analysis.
We selected the top 800 (approximately 10%) of the 7,670 SNPs that were successfully genotyped in Stage 1 (i.e., passed quality control and were not monomorphic) to genotype in Stage 2; selection was based on the minor allele frequency (MAF) >5% in control subjects and the trend P value from the analyses of all Caucasian NHL cases and controls. Of these SNPs, 23 failed Illumina design for this round of genotyping, and 4 others were no longer mapped uniquely to the same location on the genome. The remaining 773 were genotyped. Using Plink software, we evaluated the genotyping quality. We dropped SNPs with call rates <95% (N = 33), SNPs that were monomorphic (N = 2), and SNPs that had poor genotype clustering (N = 1). After dropping 82 subjects (41 cases and 41 controls) with call rates <90%, we had 1,009 cases and 1,233 controls in the combined analyses of Stage 1 and 2. Concordance among duplicate samples was >99.9%. Hardy–Weinberg equilibrium (HWE) was evaluated among the control subjects for each SNP using an exact test. SNPs with an HWE P value less than 1 × 10−3 (N = 19) were deemed questionable and were examined further by examining cluster plots. All plots appeared reasonable and no further exclusions were made. Thus, there was a final total of 737 SNPs available for analysis.
Gene expression analysis
Whole exome sequencing (on paired tumor/normal) and gene expression levels from initial (frozen) diagnostic specimens of 36 DLBCL tumors were available (20), of which 11 were also genotyped in this study. Affymetrix HG-U133 plus2.0 microarray chips were used for gene expression profiling and the data were preprocessed using the Robust Multichip Average method (21). We also had whole exome (paired tumor/normal) and RNA next generation sequencing (RNAseq) from initial (frozen) diagnostic specimens of 8 FL tumors (unpublished data); none of these specimens overlapped this study. We compared the gene expression levels from the Affymetrix chip by SNP genotype based on the Illumina OPA genotype call for DLBCL, and the RNAseq levels by SNP genotype based on the tumor exome genotype for FL.
Gene regulatory network analysis
The MetaCore's autoexpand algorithm (GeneGo Inc.) was used for regulatory gene network analysis. The genes implied by the SNPs were used as the input genes to build the network using the canonical pathways. The autoexpand algorithm draws subnetworks around the input genes and the expansion halts when the subnetworks intersect.
Unconditional logistic regression was used to estimate OR and 95% confidence intervals (CI) for the association between NHL case status and each SNP. Analyses were adjusted for age (including its functional form) and gender and the most common homozygous genotype was treated as the referent category for each of the SNPs. Each SNP was modeled in a log-additive manner in the regression model and the Wald P value was used to assess significance. Analyses were conducted for Stage 1 and Stage 2, and then combined.
The primary analysis focused on all NHL and P < 0.001 for the log-additive model in the combined analyses. To determine the proper multiple-comparisons correction for this 2-stage design, we used PLINK to subset our original discovery-phase 7,670 SNPs, into a set of independent SNPs (R2 = 0) using the variance inflation factor sliding window approach. The number of independent SNPs (n = 352) was then used as a Bonferroni correction for our pooled analyses of Stage 1 and 2 subjects. SNPs with a trend P value below 1.4 × 10−4 (= 0.05/352) were considered of interest for associations with NHL overall. For SNPs meeting this criterion, we further evaluated other available SNPs from the local region as well as the association with major NHL subtypes (CLL/SLL, DLBCL, FL). The multiple testing threshold for SNPs associated with NHL subtypes was a trend P value below 4.7 × 10−5 [0.05/(352*3)]. Statistical analyses used SAS software (SAS Institute, Inc.).
Cases and controls were well balanced on the study design factors of age, sex, and state of residence in each stage (Table 1). The pooled data set had 1,009 NHL cases and 1,233 controls, and the most common NHL subtypes were CLL/SLL (N = 327), FL (N = 238), and DLBCL (N = 189).
SNPs in the pooled analysis with a P < 0.001 are shown in Table 2. Only the top ranked SNP from TAP2 met the corrected P value threshold of 1.4 × 10−4. This TAP2 SNP is common (MAF 0.26) and leads to a coding change at position 665 (Thr→Ala). Compared with the GG genotype, there was an increased risk of NHL with the GA (OR = 1.30; 95% CI 1.09–1.55) and the AA (OR = 1.89; 95% CI 1.33–2.68) genotypes.
Besides the SNP from TAP2, there were 2 other SNPs, rs2857597 from AIF1 and rs1894408 from HLA-DOB that were from the 6p21.3 region, whereas the other top SNPs were from genes on chromosome 18 (NFATC1), 2 (ZAP70), and 12 (VDR, PLXNC1, PTPRO). When we conducted a regulatory network analysis of these 9 genes using MetaCore's autoexpand algorithm, 8 out of 9 genes (excluding AIF1) were functionally connected with only 1 node (gene) away from each other (Fig. 1), suggesting that virtually all of the top hits from the study are closely related from regulatory perspective.
We next evaluated the chromosome 6p21 region with all SNPs available from the replication phase along with results for the major NHL subtypes (Table 3). There were several additional nominally significant (P < 0.01) SNPs between TAP2 and AIF1, including SNPs in BAT3, C2, and HLA-DRA. In NHL subtype analyses, the strongest associations for SNPs from this region were for FL: rs241447 (TAP2), rs1894408 (HLA-DOB), rs7192 (HLA-DRA), and rs7746553 (C2), and of these 3 SNPs, all exceeded our multiple testing P value for the subtype analyses (i.e., 4.7 × 10−5). For CLL/SLL and DLBCL, SNPs from the 6p21.3 region did not meet the multiple testing threshold P value of 4.7 × 10−5, but they did show similar, albeit slightly weaker, ORs for CLL/SLL (except for TAP2 and C2 SNPs) and DLBCL (except for the HLA-DR SNPs).
Also available from the larger genotyping project were 2 GWAS SNPs previously identified in the 6p21.3 region for FL but which were not on the Immune and Inflammation SNP platform: rs10484561 published by Conde and colleagues (14) and rs2647012 published by Smedby and colleagues (15); the Mayo Clinic study contributed primary data to the latter study for rs2647012. Figure 2 shows our results for this region for FL. There were strong associations for both of these FL GWAS SNPs: rs10484561 (allelic OR = 2.23, 95% CI 1.70–2.92; P trend = 8.26 × 10−9) and rs2647012 (OR = 0.56, 95% CI 0.45–0.69; 8.03 × 10−8). Our top FL SNP rs241447 (TAP2) was not in strong linkage disequilibrium (LD) with the FL GWAS SNPs rs10484561 (r2 = 0.16; D' = 0.67) or rs2647012 (r2 = 0.014; D' = 0.25) based on genotypes in our 1,233 controls. After simultaneous adjustment for all 3 SNPs in a logistic regression analysis, rs10484561 (allelic OR = 2.16, 95% CI 1.66–2.81; P trend = 1.11 × 10−8), rs2647012 (OR = 0.57, 95% CI 0 0.46–0.70; 1.04 × 10−7), and rs241447 (OR = 1.81; 95% CI 1.46–2.24; P trend = 6.89 × 10−8) remained significant, supporting independent effects. We observed no interactions between our top hit rs241447 and either FL GWAS SNPs rs10484561 or rs2647012 (data not shown). We did not genotype the third GWAS SNP rs6457327; however, in HapMap data, this SNP was not in LD with rs241447 (r2 = 0.012).
The other SNP strongly associated with FL, rs7192 from HLA-DRA, was not in strong LD with our top FL SNP rs241447 (r2 = 0.037; D' = 0.40) nor with rs10484561 identified by the Conde and colleagues (r2 = 0.077; D' = 0.96), and rs7192 remained significant after adjustment for our top FL SNP rs241447 (OR = 0.61; 95% CI 0.49–0.75; P = 7.9 × 10−6) and for the Conde and colleagues (14) FL GWAS SNP rs10484561 (OR = 0.64; 95% CI 0.51–0.79; 5.1 × 10−5). In contrast, rs7192 was in stronger LD with the Smedby and colleagues SNP rs2647012 (r2 = 0.52; D' = 0.73), and after adjustment for the latter SNP, rs7192 remained marginally statistically significant (OR = 0.71; 95% CI 0.53–0.96; P = 0.028).
Finally, we explored whether rs241447 genotype was associated with TAP2 mRNA expression. From the set of 8 FLs with paired tumor-normal exome and RNAseq data, there was a trend of higher TAP2 expression in patients with the GG or GA compared with the AA genotype (P = 0.14; Fig. 3A). In the case-control study, the dominant model OR for TAP2 in FL (GG or GA vs. AA genotype) was 2.01 (95% CI 1.51–2.68). From the set of 11 DLBCL cases genotyped in the case-control study that also had tumor gene expression measured using the Affymetrix HG-U133 plus2.0 microarray chips, there was higher TAP2 expression in patients with the GG or GA compared with the AA genotype (P = 3.3 × 10−6; Fig. 3B). In the case-control study, the dominant model OR for TAP2 in DLBCL (GG or GA vs. AA genotype) was 1.39 (95% CI 1.01–1.91). For DLBCL, we also assessed genotype based on the exome sequencing available on 36 tumors. Unfortunately, the exome sequencing data did not have sufficient coverage at the rs241447 position to make a reliable genotype call, but a SNP in perfect LD (rs241441) was well covered, and there was higher TAP2 expression in patients with the GG or GA compared with the AA genotype (P = 0.0030; Fig. 3C).
We have conducted follow-up analyses of the top 10% of SNPs from the ParAllele (Affymetrix) Immune and Inflammation panel in a new set of 584 cases and 768 controls, for a combined sample of 1,009 cases and 1,233 controls. We found that the common SNP rs241447 (MAF 0.26) in TAP2 from the 6p21.3 region showed a significant association with risk of NHL overall after correcting for multiple testing; the association was particularly strong for FL, but was also apparent for DLBCL. Higher TAP2 expression was associated with the risk allele in both FL and DLBCL tumors.
The 6p21.3 region is a large, complex, and immune gene-rich region that has been previously implicated as a susceptibility locus for overall NHL risk (5, 10–12, 15). Furthermore, this region has been flagged as a region of interest for not only for NHL, but also for the specific NHL subtypes of FL (10, 11, 13, 14), DLBCL (5, 10, 15), and familial CLL/SLL (16). In NHL subtype analyses, we found genome-wide significance for the TAP2 SNP rs241447 with FL risk, as well as a weaker but still evident association with DLBCL but no association with CLL/SLL. TAP2 was not in the top 40 Stage 1 SNPs for FL in either of the published GWAS (14, 15). The TAP2 SNP rs241447 is predicted to be “damaging” by Sorting Intolerant From Tolerant (22), and is located in an evolutionary conserved domain across 28 species based on multiz (23) and phastCon (24) calculations. In FL, rs241447 was not in LD with either of the previously identified FL GWAS SNPs rs10484561 (14) and rs2647012 (15), and all 3 SNPs remained significant in a multivariate model. Our results independently replicate rs10484561 in FL (14), and identify TAP2 as a novel and independent risk loci for FL and perhaps DLBCL.
TAP2 [transporter 2, ATP-binding cassette (ABC), subfamily B] is a member of the multidrug resistance protein/TAP subfamily of ABC transporters, and is involved in both multidrug resistance and antigen presentation (25, 26). TAP2 forms a heterodimer with TAP1 to transport peptides (ranging from ions to large proteins) from the cytoplasm to the endoplasmic reticulum (25, 26), and is essential for loading of antigen on HLA class I protein on the cell surface (27). TAP2 and TAP1 are located in the MHC II locus of chromosome 6, between HLA-DOB and HLA-DMB, and genetic variation in these genes has been associated with type 1 diabetes, systemic lupus erythematosus (SLE), and celiac disease (26), conditions that have been associated with overall NHL risk in some studies (1). The TAP2 SNP rs241447 specifically has been positively associated with SLE (OR = 1.46 per allele, 95% CI 1.14–1.88; 28) and inversely associated with type 1 diabetes (OR = 0.43; 95% CI 0.35–0.52; 29), and these associations were not because of LD with HLA-DRB1 or DR-DQ, respectively. SLE has more consistently been associated with NHL risk, including DLBCL and FL risk, although the association for type 1 diabetes with NHL overall or for NHL subtypes has been mixed (30).
Some studies have reported LD between TAP2 and HLA class II alleles (31, 32), whereas others have not (28, 29, 33). Although HLA class II alleles (HLA-DRB1*0101 and *13) were associated with FL in one recent study (10), we did not have genotyping for class II alleles and so could not address LD with TAP2, and this remains an important future research question. The TAP2 SNP is also in a region of high LD with several other coding SNPs, including rs241448 (ter687Q) and rs241449 (a synonymous SNP), and haplotypes formed by these alleles leads to alternative splicing and different isoforms of the protein known to have different peptide selectivity (29). Downregulation or a loss of TAP expression (by mutation or other mechanisms) leads to loss of surface HLA class I expression, allowing tumors to escape immune recognition (25). Our data suggests that common genetic variation in the TAP2 gene is associated with TAP2 expression and increased risk of NHL, particularly FL, raising the hypothesis that TAP2 may predispose to lymphomagenesis, perhaps by influencing antigen presentation of HLA class I molecules.
Although no other genes met our multiple testing threshold for all NHL, AIF1 (12), BCL2L11 (34, 35), and VDR (6) have previously been implicated in either NHL overall or one of the common subtypes. Germline genetic variation in ZAP70 has not been associated with NHL risk, but ZAP70 expression has been associated with prognosis in CLL (36). Although NFATC1, PLXNC1, and PTPRO have not been associated with NHL, NFATC1 is known to regulate the expression of growth and survival genes including MYC, TNF, CD40L, and BAFF, all of which have also been linked to lymphomagenesis (5, 11, 37, 38). However, given the high potential for false positive results in this setting, our results will need to be replicated in other studies or through pooled analyses.
Strengths of this study include the use of carefully designed case-control study (18); central pathology review and classification; a well characterized, comprehensive panel of immune and inflammation genes based on HapMap SNPs; a 2-stage design; and relatively large sample size. Limitations include lower power to assess NHL subtypes and use of a white population, although this enhances internal validity in the setting of a genetic association study. We have previously published data from this study showing lack of population stratification in this study population (17). Finally, we were also able to adjust for the 2 strongest GWAS SNPs. In summary, TAP2 seems to be a strong candidate susceptibility gene for NHL, particularly FL. Further genetic and protein are needed to confirm abnormalities or aberrant function of TAP2 are warranted.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Conception and design: J.R. Cerhan, Z.S. Fredericksen, S.M. Ansell, T.M. Habermann, S.L. Slager
Development of methodology: J.R. Cerhan, Z.S. Fredericksen, S.L. Slager
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): J.R. Cerhan, Z.S. Fredericksen, A.J. Novak, M. Liebow, A. Dogan, J.M. Cunningham, T.E. Witzig, T.M. Habermann, S.L. Slager
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): J.R. Cerhan, Z.S. Fredericksen, S.M. Ansell, A. Dogan, A.H. Wang, T.M. Habermann, Y.A. Asmann, S.L. Slager
Writing, review, and/or revision of the manuscript: J.R. Cerhan, Z.S. Fredericksen, S.M. Ansell, N.E. Kay, M. Liebow, J.M. Cunningham, A.H. Wang, T.M. Habermann, Y.A. Asmann, S.L. Slager
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): J.R. Cerhan, Z.S. Fredericksen, A.H. Wang, T.M. Habermann, S.L. Slager
Study supervision: J.R. Cerhan, Z.S. Fredericksen
National Cancer Institute/NIH grant R01 CA92153 and the Predolin Foundation.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The authors thank Sondra Buehler for her editorial assistance.
- Received June 7, 2012.
- Revision received July 30, 2012.
- Accepted August 9, 2012.
- ©2012 American Association for Cancer Research.