Background: Recent genome-wide association studies (GWAS), mostly conducted among women of European ancestry, have identified 16 single-nucleotide polymorphisms (SNP) associated with breast cancer.
Methods: We evaluated these SNPs with the risk of breast cancer and further by estrogen receptor status in a population-based study of 6,498 cases and 3,999 controls in Chinese women. We also searched for novel genetic risk variants in four loci, 2q35, 5p12/MRPS30, 8q24.21, and 17q23.2/COX11, in a two-stage study. In stage I, 868 SNPs were analyzed in 2,073 cases and 2,084 controls. In stage II, 58 SNPs selected from stage I were evaluated, including 4,425 cases and 1,915 controls.
Results: Statistically significant associations (P < 0.05) were observed for eight GWAS-identified SNPs, including rs4973768 (3p24/SLC4A7), rs889312 (5q11.2MAP3K1), rs2046210 (6q25.1), rs1219648 (10q26.13/FGFR2), rs2981582 (10q26.13/FGFR2), rs3817198 (11p15.5/LSP1), rs8051542 (16q12.1/TOX3), and rs3803662 (16q12.1/TOX3). Two additional SNPs, rs10941679 (5p12/MRPS30) and rs13281615 (8q24.21), showed a marginally significant association. Some of these associations varied by estrogen receptor status. In the fine-mapping analysis, five SNPs showed a consistent association with breast cancer risk in both stages: rs10169372 (2q35), rs283720 (8q24.21), rs10515083 (17q23.2/COX11), rs16955329 (17q23.2/COX11), and rs2787487 (17q23.2/COX11).
Conclusions: This study shows that approximately half of the SNPs initially reported from GWAS of breast cancer in European descendants can be directly replicated in Chinese. Our fine-mapping analyses revealed several candidates of risk variants that can be further evaluated in studies with a larger sample size.
Impact: Findings from this study may help guide future fine-mapping studies to identify causal variants for breast cancer. Cancer Epidemiol Biomarkers Prev; 19(9); 2357–65. ©2010 AACR.
Breast cancer is the most common malignancy among women in the United States and many other parts of the world. Genetic factors play an important role in the etiology of breast cancer. Recently, several genome-wide association studies (GWAS; refs. 1-8), including our own study among Chinese women in Shanghai (5), have identified multiple genetic susceptibility loci for breast cancer. With the exception of our study, all other reported GWAS have been conducted among women of European ancestry. The vast majority of the risk variants identified thus far, however, are single-nucleotide polymorphisms (SNP) that are associated with disease risk through linkage disequilibrium (LD) with the causal variants. Therefore, some risk alleles identified in Europeans may not be extrapolated to Asians given the difference in LD patterns between these two populations. Investigation of previously reported loci in non-European populations may help to evaluate the generalizability of these initial findings and to identify causal variants. Further evaluation of previously reported loci could also help to identify additional risk variants in some of the loci, as in the case of 8q24.21 for prostate cancer risk (9-11) and 16q12 for breast cancer risk (1).
Using data from the Shanghai Breast Cancer Study, a population-based case-control study, we previously evaluated 11 SNPs identified initially in GWAS conducted in women of European ancestry (12). In this study, we evaluated four newly identified loci for breast cancer risk from recent GWAS conducted among Europeans or European Americans. The associations of all GWAS-identified SNPs were further evaluated by estrogen receptor (ER) status. Finally, we conducted analyses to explore additional independent genetic risk variants in four loci.
Materials and Methods
Included in the study were 6,498 cases from the Shanghai Breast Cancer Study (SBCS) and Shanghai Breast Cancer Survival Study (SBCSS), as well as 3,999 controls from the SBCS and the Shanghai Endometrial Cancer Study (SECS). The SBCS is a large, population-based case-control study of women in urban Shanghai that has been previously described in detail (5, 13). Subject recruitment in the initial phase of the SBCS (SBCS-I) was conducted between August 1996 and March 1998. The second phase (SBCS-II) of recruitment occurred between April 2002 and February 2005. Breast cancer cases were identified through the population-based Shanghai Cancer Registry, which for the SBCS-I was supplemented by a rapid case-ascertainment system. Controls were randomly selected using the Shanghai Resident Registry. Also included in the present study were cases recruited between April 2002 and December 2006 as part of the SBCSS. The controls for the SBCSS cases came from the SECS, which recruited healthy women between January 1997 and December 2003. Of the eligible participants, 1,459 cases (91.1%) and 1,556 controls (90.3%) in the SBCS-I, 1,989 cases (83.7%) and 1,918 (70.4%) controls in the SBCS-II, and 5,046 cases (80.1%) in the SBCSS, and 1,212 controls (74.4%) in the SECS completed in-person interviews with structured questionnaires. Blood or buccal cell samples were collected and made available for 1,193 cases (81.8%) and 1,310 controls (84.2%) from the SBCS-I, 1,932 cases (97.1%) and 1,857 controls (96.8%) from the SBCS-II, 4,845 (96.0%) cases from the SBCSS, and 1,039 (85.7%) controls from the SECS. Because of a time overlap in subject recruitment, 1,469 breast cancer patients participated in both the SBCS-II and the SBCSS and 109 controls participated in both the SBCS-I and the SECS, so that the actual total number of participants came to 3,466 cases from the SBCSS and 930 controls from the SECS. Genomic DNA was extracted using commercial DNA purification kits. Approval of the study was granted by the relevant institutional review boards in both China and the United States.
SNP selection and statistical analysis
Four loci reported from studies conducted among Europeans or European Americans, including 2q35, 5p12/MRPS30, 8q24.21, and 17q23.2/COX11, were selected to identify additional SNPs that may be associated with breast cancer in our Chinese population. These four loci were selected because the initially reported SNPs in each of these loci did not show an apparent association with the overall risk of breast cancer in the Chinese population.
In each of these four loci, a region (±100 kb) flanking the initially reported SNP was selected. The initially selected region was extended according to the following two scenarios: (a) If the LD block, including the initially reported SNP, extended outside the 200 kb region, then the whole LD block was included; or (b) if the 100-kb flanking region contained part of a known gene, the whole gene was included. Using these criteria, the following four regions were investigated: 44642255-44996680 (354 kb) for 5p12 (rs10941679), 50311470-50628909 (317 kb) for 17q23.2 (rs6504950), and a 200-kb region for 2q23 (rs13387042) and 8q24.21 (rs13281615), based on National Center for Biotechnology Information Build 36.
Stage I analyses were conducted primarily based on the GWAS data obtained using Affymetrix SNP 6.0 arrays. SNPs not found on the array were imputed using the program MACH with the HapMap II Asian data (release 22) as a reference. Association analysis for each SNP was done by logistic regression, and imputation uncertainty was taken into account by using the program MACH2DAT. Within each region, the SNP identified in previous GWAS was adjusted in the logistic regression model. A total of 868 SNPs with a minor allele frequency (MAF) of ≥0.05 were analyzed, including 241 directly genotyped and 627 imputed SNPs. Of these, 32 SNPs had low imputation quality (quality score <0.9) and 26 SNPs showed significant association with breast cancer at P ≤ 0.05 after adjusting for the initially reported SNP. A total of 35 tagging SNPs were selected to cover these 58 SNPs, with pairwise r2 ≥ 0.8 using the HapMap Asian data as reference. Of these 35 tagging SNPs, 32 were successfully genotyped in stage II samples, including 4,425 cases and 1,915 controls. Of the 32 successfully typed SNPs, five were significantly associated with breast cancer in stage II samples and showed low imputation quality in stage I. They were directly genotyped in stage I samples, which we referred to as stage III in this study. Logistic regression models were used to estimate odds ratios (OR) and 95% confidence intervals (95% CI) for each SNP in association with breast cancer risk after adjusting for age, education, and body mass index. The results did not change appreciably with or without these potential confounding factors. Heterogeneity between the associations of SNPs with ER-positive and ER-negative diseases was assessed using logistic regression analyses restricted to cases (case-only analyses), with the ER status as the outcome variable. P values based on two-tailed tests are presented. All analyses were done using SAS version 9.1 (SAS Institute).
Genotyping using the Affymetrix GeneChip Mapping 500K Array Set and the Affymetrix Genome-Wide Human SNP Array 6.0 has been described previously (5). Among the 16 SNPs reported in previous GWAS, four SNPs, rs2180341 (6q22.33/ECHDC1), rs3817198 (11p15.5/LSP1), rs3803662 (16q12.1/TOX3), and rs2046210 (6q25.1/unknown), were included in both the Affymetrix SNP Array 6.0 and the GeneChip Mapping 500K Array Set. Therefore, genotyping data for these four SNPs were available for 4,157 participants. Three SNPs, rs1219648 (10q26.13/FGFR2), rs2981582 (10q26.13/FGFR2), and rs8051542 (16q12.1/TOX3), were included only on Affymetrix 6.0 and not on Affymetrix 500K; thus, genotyping data were available for only 3,866 GWAS participants who were genotyped by Affymetrix 6.0. Of the remaining participants not included in the genotyping using the Affymetrix SNP arrays, these seven SNPs were genotyped using iPLEX Sequenom MassARRAY platform. The four recently reported SNPs, rs11249433 (1p11.2/NOTCH2), rs4973768 (3p24/SLC4A7), rs999737 (14q24.1/RAD51L1), and rs6504950 (17q23.2/COX11), were not included on the Affymetrix 6.0 array and were also genotyped using Sequenom. The remaining five SNPs, rs13387042 (2q35/unknown), rs10941679 (5p12/MRPS30), rs889312 (5q11.2/MAP3K1), rs13281615 (8q24.21/unknown), and rs12443621 (16q12.1/TOX3), were genotyped using the TaqMan allelic discrimination assay (Applied Biosystems).
In stages II and III, the iPLEX Sequenom MassARRAY platform was used for genotyping. On each 96-well plate, two negative controls, two blinded duplicates, and two samples from the HapMap project were included. The mean consistency rates were 98.2% for the blinded duplicates and 99.2% compared with data from HapMap.
The distributions of demographic characteristics and known breast cancer risk factors for cases and controls are shown in Table 1. An elevated risk of breast cancer was consistently observed for all known major breast cancer risk factors, including family history of breast cancer, prior history of benign breast disease, physical inactivity, early onset of menarche, late onset of menopause, late age at first live birth, high body mass index, and high waist-to-hip ratio.
Among the 16 SNPs identified in previous GWAS, significant associations (P < 0.05) were observed at eight SNPs: rs4973768 (3p24/SLC4A7), rs889312 (5q11.2/MAP3K1), rs2046210 (6q25.1/unknown), rs1219648 (10q26.13/FGFR2), rs2981582 (10q26.13/FGFR2), rs3817198 (11p15.5/LSP1), rs8051542 (16q12.1/TOX3), and rs3803662 (16q12.1/TOX3). Two additional SNPs, rs10941679 (5p12/MRPS30) and rs13281615 (8q24.21/unknown), showed an association of borderline significance (P ≤ 0.15; Table 2). Interestingly, the association with rs13281615 was statistically significant for ER-negative breast cancer. Two other SNPs have a very low MAF in Chinese: 3% for rs11249433 (1p11.2/NOTCH2) and 0.2% for rs999737 (14q24.1/RAD51L1). Therefore, the statistical power to detect a significant association in this study is low.
Although no overall association of breast cancer was found for rs13281615 (8q24.21/unknown), analyses by ER status revealed a statistically significant association with ER-negative tumors (P = 0.02). With the exception of rs13281615 and rs2046210 (6q25.1/unknown), breast cancer–associated SNPs, in general, showed a stronger association with ER-positive tumor than with ER-negative tumor and the difference was statistically significant for rs1219648 (10q26.13/FGFR2).
Four loci, including rs13387042 (2q35/unknown), rs10941679 (5p12/MRPS30), rs13281615 (8q24.21/unknown), and rs6504950 (17q23.2/COX11), were further investigated to identify potential novel breast cancer risk variants in Chinese women. Stage I data for these four loci were extracted from the GWAS data of 2,073 cases and 2,084 controls. In these four regions, a total of 241 SNPs passed our quality control protocol (5), with a call rate ≥95%, a concordance rate ≥95% among duplicated samples, and a MAF ≥0.05. Another 627 SNPs were successfully imputed (with a quality score ≥0.9) by using the program MACH with the HapMap Asian data as the reference. Among these 868 SNPs, 30 SNPs showed an association at P ≤ 0.05, including 3 SNPs in the region of 2q35, 23 in 8q24.21, and 4 in 17q23.2. After adjusting for the reported SNPs in each locus, 26 of these 30 SNPs still showed an association with breast cancer at P ≤ 0.05 (Fig. 1). In these four loci, 32 SNPs on HapMap were imputed with low quality (quality score < 0.9), and these SNPs along with SNPs showing an association with a P ≤ 0.05 were selected for further evaluation. A total of 35 SNPs were selected to tag these 58 SNPs for stage II validation.
In stage II samples, among the 32 successfully genotyped SNPs, SNP rs12949538, located in 17q23.2/COX11, was significantly associated with breast cancer risk with an OR of 0.84 (95% CI, 0.75-0.94) at P = 0.002. The association direction, however, was contrary to results from the GWAS data in stage I. In stage II, another five SNPs, including rs7703618 (5p12/MRPS30), rs7003345 (8q24.21/unknown), rs11986916 (8q24.21/unknown), rs16955329 (17q23.2/COX11), and rs2958919 (17q23.2/COX11), were significantly associated with breast cancer risk at P ≤ 0.05 (Table 3). All five SNPs showed an imputation quality score <0.9 in stage I. To validate the results observed in stage II, these SNPs were directly genotyped in stage III samples. None of these five SNPs, however, showed significant associations in stage III (Table 3).
In the analysis of combined data from stage II and stage I/III, six SNPs, including rs10169372 (2q35/unknown), rs7703618 (5p12/MRPS30), rs283720 (8q24.21/unknown), and three SNPs located in 17q23.2/COX11 (rs10515083, rs2787487, and rs16955329), showed an association with breast cancer risk, including five SNPs that showed a consistent association in both study stages (Table 3). Analyses stratified by ER status showed that all of these five SNPs showed stronger associations with ER-positive tumors than with ER-negative tumors, although the heterogeneity test was statistically significant only for SNP rs16955329 (Table 4).
In the present study, of the 14 independent variants identified in GWAS conducted among women of European ancestry [excluding rs2981582 in 10q26.13/FGFR2 and rs2046210 (6q25.1/unknown), which were initially identified in a Chinese population], eight SNPs showed an association consistent with that observed in women of European ancestry, and the per-allele ORs were either statistically significant [rs4973768 (3p24/SLC4A7), rs889312 (5q11.2/MAP3K1), rs1219648 (10q26.13/FGFR2), rs3817198 (11p15.5/LSP1), rs8051542 (16q12.1/TOX3), and rs3803662 (16q12.1/TOX3)] or marginally significant [rs10941679 (5p12/MRPS30) and rs13281615 (8q24.21)]. Analyses by ER status showed that the association of breast cancer for some SNPs may differ by ER status. Our fine-mapping analyses revealed several promising candidates that could be further evaluated. Overall, the results from this study provide further evidence for the association of GWAS identified SNPs in relation to breast cancer risk in non-European populations.
SNPs rs11249433 (1p11.2/NOTCH2) and rs999737 (14q24.1/RAD51L1) have a very low MAF in Chinese (3.0% and 0.2%, respectively). Intriguingly, the MAFs for these SNPs are quite high in European populations, 42.5% for rs11249433 (1p11.2/NOTCH2) and 26.1% for rs999737 (14q24.1/RAD51L1). Therefore, the genetic architectures in these two loci between Chinese and Europeans are quite different. For the other four SNPs, we found either a null or a very weak association [rs13387042 (2q35/unknown), rs12443621 (16q12.1/TOX3), and rs6504950 (17q23.2/COX11)] or an association that was the opposite of that observed previously [rs2180341 (6q22.33/ECHDC1)]. With the sample size of the current study, we have 80% of statistical power to detect an OR as small as 1.13, 1.08, 1.14, and 1.09 for SNPs rs13387042 (2q35/unknown), rs12443621 (16q12.1/TOX9), rs6504950 (17q23.2/COX11), and rs2180341 (6q22.33/ECHDC1), respectively. Therefore, we could reasonably conclude that these four SNPs are not strongly associated with breast cancer risk in Chinese. Stratification analyses by ER status for these four SNPs did not show any association consistent with that observed in women of European ancestry.
Previous studies among women of European ancestry showed that the association of breast cancer with rs1219648 (10q26.13/FGFR2), rs10941679 (5p12/MRPS30), and rs889312 (5q11.2/MAP3K1) was stronger in ER-positive than in ER-negative tumor (8, 14, 15). Results from this study were in general consistent with previous findings for these SNPs, although the test for heterogeneity was statistically significant for rs1219648 (10q26.13/FGFR2) with P = 0.001. We found that rs13281615 (8q24.21/unknown) was more related to ER-negative than to ER-positive cancer, a finding that was inconsistent with that from a previous study among women of European ancestry (14). The reason for this inconsistency is unknown. As reported previously (5), rs2046210 (6q25.1/unknown) was found to be more closely related to ER-negative than to ER-positive breast cancer. This association in non-Chinese women remains to be evaluated.
SNP rs13387042 at 2q35 was originally associated with breast cancer, especially ER-positive cancer, in a study conducted among Europeans (3). This SNP lies in a 90-kb high-LD region that contains neither known genes nor noncoding RNAs (3). Recently, this SNP was investigated in approximately 30,000 cases and 30,000 controls from 25 studies in the Breast Cancer Association Consortium (BCAC; ref. 16). A significant association was observed in Europeans with an OR of 1.12 (95% CI, 1.09-1.15), which is much smaller than that originally observed of 1.20 (95% CI, 1.14-1.26). A significant association with this SNP was also observed in our previous study of African American women, which included 810 cases and 1,784 controls (17). However, no significant association has been observed in Asian populations (3, 12, 16).
SNP rs12443621 is located in 16q12.1, a region where two additional genetic risk variants for breast cancer (rs8051542 and rs3803662) were reported previously in a study conducted among women of European ancestry (1). Recently, we identified a functional genetic variant (rs4784227) at this chromosome region for breast cancer risk (18). In the present study, the other two reported SNPs, rs3803662 and rs8051542, showed significant associations consistent with that observed in women of European ancestry. The LD pattern of this region in Asians is very different from the pattern found in European descendents. For example, there is no LD between rs12443621 and rs3803662 (r2 = 0.04) in Chinese, but there is moderate LD (r2 = 0.3) in Europeans.
SNP rs6504950 at 17q23.2 did not show a significant association in the present study; this finding was consistent with the results in Asians in the original GWAS (7) that discovered this SNP. No statistically significant association was observed in Asian women, although the per-allele OR was very similar: 0.96 (95% CI, 0.82-1.12) for Asians and 0.95 (95% CI, 0.93-0.98) for Europeans (7). The genetic architecture in this locus differs considerably across populations; for example, the MAF is 8% in Chinese and 30% in Europeans.
SNP rs2180341 was originally discovered in the Ashkenazi Jewish population (4). Later, it was replicated in an additional 487 Ashkenazi Jewish breast cancer cases and in a European American population of 1,466 breast cancer cases and 1,467 controls (19). There were no data available for Asians. In the present study, we observed a borderline significant association with ER-positive tumors; however, the association was opposite to the original finding in the Ashkenazi Jewish population.
There are some potential explanations for the failure of direct replication of the loci identified in Europeans or European Americans. One possibility is that, in the Chinese populations, no common SNPs exist in the regions that are associated with breast cancer. It is possible that other common SNPs in these regions have not been reported and thus were not included in the current study. It is also possible that some other types of variants located in these regions, such as copy number variation, small insertion-deletion polymorphisms, or rare variants, are associated with breast cancer. Additionally, Asian women might have different lifestyles or environmental exposures that may mask the effect of these SNPs in breast cancer risk. Genetic interactions with other SNPs that differ in frequency between populations could also manifest as effect heterogeneity.
In an attempt to identify risk variants for breast cancer in regions where the original GWAS-identified SNP showed no apparent association with breast cancer risk, we performed fine-mapping for four breast cancer susceptibility loci: 2q35, 5p12, 8q24.21, and 17q23.2. We investigated the associations for all 868 SNPs on HapMap, covering at least a 200-kb region for each locus in a total sample size of more than 10,000 subjects. All SNPs were either imputed with high quality or directly genotyped. A total of five SNPs, including rs10169372 (2q35/unknown), rs283720 (8q24.21/unknown), rs10515083 (17q23.2/COX11), rs16955329 (17q23.2/COX11), and rs2787487 (17q23.2/COX11), showed a consistent association with breast cancer risk in both stages. Although the associations with these SNPs in the combined analyses all reached a nominal significance level, they were not significant after adjusting for multiple comparisons. Nevertheless, these SNPs are good candidates for future studies. One limitation for this fine-mapping work is that SNPs not included in HapMap were not investigated. It would be helpful to sequence the targeted region for future studies to discover variants not included in the HapMap database.
In summary, we have now evaluated 14 independent SNPs that were initially reported in Europeans or European Americans. Eight of these SNPs showed strong evidence of association with breast cancer risk (statistically significant or marginally significant with an association consistent with those seen in previous GWAS), which brings the total number of GWAS-identified SNPs in Chinese populations to nine. We searched for additional independent genetic risk variants in four GWAS-mapped loci, in which the reported SNPs showed no apparent associations in Chinese. Several SNPs in these regions showed a statistically significant association with breast cancer risk. Although these associations were not statistically significant after adjusting for multiple comparisons, they may be good candidates for future studies. Additional in-depth fine-mapping studies with large sample sizes may be needed to fully evaluate these regions and to identify potential risk variants for breast cancer in Asian women.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
We thank the study participants and research staff for their contributions and commitment to this project, Regina Courtney and the late Qing Wang for DNA preparation, and Brandy Venuti for clerical support in the preparation of this manuscript. Sequenom genotyping was carried out at Proactive Genomics, Winston-Salem, NC. Sample preparation and genotyping using Affymetrix chips and TaqMan platform were conducted at the Survey and Biospecimen and Functional Genomic, which are supported in part by the Vanderbilt-Ingram Cancer Center (P30 CA68485).
Grant Support: NIH grants R01CA124558, R01CA64277, R01CA90899, R01CA92585, R01CA122756, and R01CA137013. The content of this article is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
- Received January 15, 2010.
- Revision received June 12, 2010.
- Accepted June 29, 2010.