## Abstract

**Background:** Mammographic density, the area of the mammographic image that appears white or bright, predicts breast cancer risk. We estimated the proportions of variance explained by questionnaire-measured breast cancer risk factors and by unmeasured residual familial factors.

**Methods:** For 544 MZ and 339 DZ twin pairs and 1,558 non-twin sisters from 1,564 families, mammographic density was measured using the computer-assisted method *Cumulus*. We estimated associations using multilevel mixed-effects linear regression and studied familial aspects using a multivariate normal model.

**Results:** The proportions of variance explained by age, body mass index (BMI), and other risk factors, respectively, were 4%, 1%, and 4% for dense area; 7%, 14%, and 4% for percent dense area; and 7%, 40%, and 1% for nondense area. Associations with dense area and percent dense area were in opposite directions than for nondense area. After adjusting for measured factors, the correlations of dense area with percent dense area and nondense area were 0.84 and −0.46, respectively. The MZ, DZ, and sister pair correlations were 0.59, 0.28, and 0.29 for dense area; 0.57, 0.30, and 0.28 for percent dense area; and 0.56, 0.27, and 0.28 for nondense area (SE = 0.02, 0.04, and 0.03, respectively).

**Conclusions:** Under the classic twin model, 50% to 60% (SE = 5%) of the variance of mammographic density measures that predict breast cancer risk are due to undiscovered genetic factors, and the remainder to as yet unknown individual-specific, nongenetic factors.

**Impact:** Much remains to be learnt about the genetic and environmental determinants of mammographic density. *Cancer Epidemiol Biomarkers Prev; 22(12); 2395–403. ©2013 AACR*.

## Introduction

Mammographic density is the area of the mammographic image of the breast that appears white or bright. The state-of-the-art method for measuring mammographic density is a computer-assisted thresholding technique called *Cumulus*. This measures the total area of the breast image and the absolute area covered by dense tissue, as determined by the viewer, called dense area. From these measures, the absolute area of the breast image covered by nondense tissue and the percentage of breast image covered by dense tissue is easily obtained.

Several case–control studies nested within cohorts of women attending mammographic screening services have shown that various measures of mammographic density at recruitment (baseline) predict subsequent risk of breast cancer (1, 2). These studies have virtually always matched cases and controls on age at mammogram and have adjusted for breast cancer risk factors measured at baseline. They have found that for women of the same age, body mass index (BMI), and other measured risk factors for breast cancer, those with a greater dense area (either absolutely or as a percentage) are at greater risk of breast cancer.

Percent dense area is negatively associated with age, and even more so with BMI, yet age and BMI are positively associated with breast cancer risk in the age groups typically studied. Thus, when considering percent dense area as a risk factor for breast cancer, its associations with BMI and age must be properly taken into account. While it is often reported that women with high (≥75%) percent dense area have a 4- to 6-fold increased risk of breast cancer compared with women with primarily fatty breasts (percent dense area ≤ 10%; ref. 3), it is rarely made explicit that these comparisons refer to *women of the same age and BMI*. Moreover, as we have shown (4), dense area and percent dense area adjusted for age and BMI are highly correlated (*r* ∼ 0.9). Consequently, the mammographic density measures that (best) predict breast cancer risk are those of dense area, or percent dense area, adjusted for age, BMI, and other breast cancer risk factors.

Recent studies have identified that nondense area might also be associated with risk even after adjusting for dense area or percent dense area, age, and BMI (nondense area is highly correlated with BMI), but in the opposite direction to dense area and percent dense area. The correlation of nondense area with dense area is around −0.3, which raises the possibility that the risk associations in opposite directions with dense area and nondense area might be, at least in part, a consequence of the same underlying phenomenon (5). The issue is also complicated by the fact that the association of BMI with breast cancer risk is not constant with respect to age at baseline or age at diagnosis. After adjusting for BMI as a function of age, Baglietto and colleagues fitted a linear combination of dense area and percent dense area to predict breast cancer risk (5).

Given that various mammographic density measures predict future occurrences of breast cancer, it is important to identify the factors that determine their mean values and quantify how much they explain their variation. In this regard, it has been found from twin and family studies that mammographic density measures are correlated in relatives (6–8), so part of their variances must be due to familial, if not genetic, factors.

Here, we have conducted a large cross-sectional study of female twin pairs, both genetically identical (monozygotic, MZ) and fraternal (dizygotic, DZ) and their sisters. We have estimated the means of the mammographic density measures as functions of the breast cancer risk factors measured by questionnaire, taking into account that the women are from families. The adjusted measures are therefore the mammographic density measures that predict breast cancer risk, independent of the other risk factors. We have then used this powerful study design to obtain insights about, and estimates of, the roles of both genetic and nongenetic factors in explaining the variances of the mammographic density measures that predict breast cancer risk.

## Methods

### Participants

Participants were from the Australian Mammographic Density Twins and Sisters Study (AMDTSS), details of which are provided in Odefrey and colleagues (4), the Genes Behind Endometriosis Study (GBES; see Treloar and colleagues; ref. 9), the Australian Breast Cancer Family Study (ABCFS; ref. 10), and volunteers from the Breast Cancer Network Australia (BCNA) and other sources. Briefly, female twin pairs ages 40 to 70 years without a prior diagnosis of breast cancer were recruited through the Australian Twin Registry. Participating twins completed a questionnaire and gave permission to access their mammograms. They were also asked to seek the permission from any eligible sisters to be invited to participate in the study. We recruited 3,324 twins and sisters from 1,564 families, including 544 MZ and 339 DZ twin pairs and 1,558 non-twin sisters. Of these, 2,345 were from the AMDTSS, 788 from the GBES, 71 from the ABCFS, and 120 from the BCNA and other sources. The study was approved by the Human Research Ethics Committee (HREC) of the University of Melbourne.

### Epidemiology questionnaire

Telephone-administered questionnaires were used to record demographic information and self-reported weight, height, smoking history, alcohol consumption, reproductive history, cessation of menstruation, use of oral contraceptives, use of hormone replacement therapy (HRT), and personal and family history of cancer. A woman was defined as postmenopausal if she had had a hysterectomy, both ovaries removed, or radiation; was not on HRT at the time of the mammogram and had not menstruated 12 months prior; or was on HRT at the time of mammogram and had not menstruated 12 months prior and was not menstruating before commencing HRT. Subjects not fitting these criteria were considered premenopausal. For twin pairs, zygosity was determined by a standard question that describes the differences between MZ and DZ twin pairs and has been shown to give 95% agreement with true zygosity (11).

### Mammographic density measurements

All available episodes of mammograms were retrieved with the participants' written consent, mostly from Australian *BreastScreen* services, but also from private clinics and private hospitals. We also retrieved mammograms from the participants themselves. The craniocaudal views for left and right breasts were selected and digitized by using the Lumysis 85 scanner at Australian Mammographic Density Research Facility. For each woman, the most recent right breast craniocaudal view was selected for mammographic density measurement and the left breast craniocaudal view was selected if the right breast mammogram was missing or unavailable. Mammographic measurements of total area and dense area were conducted using Cumulus 4.0, a computer-assisted thresholding technique, after randomization and blind to information, by 3 independent operators (J. Stone, F. Odefrey, and T.L. Nguyen) with high repeatability (4). Nondense area and percent dense area were calculated from these measures.

### Statistical methods

Associations between variables measured by questionnaire and the means of the mammographic density measures were estimated using ordinary linear regression modeling under the assumption that the residuals were normally distributed, although correlated within families. The Box–Cox procedure was used to test the normality of the distributions of the mammographic density measures and, if necessary, select an appropriate power transformation. Consequently, dense area was cube root transformed, whereas percent dense area and nondense area were square root transformed. All questionnaire measures were inspected for missing or invalid values, which were replaced with the average for continuous exposure variables and the most common value for binary or categorical exposure variables. The percentage of missing values was <1% for all variables except ductal carcinoma *in situ* (DCIS), for which it was 8% and all unknowns were coded as “no” given <1% of responders answered “yes.”

We estimated the regression coefficients, *β*, for the associations of predictors with mean mammographic density measures using multilevel mixed-effects regression analysis and the XT-MIXED option in the Stata software package (12) as it accounted for the correlations between twins and sisters. We log-transformed BMI because the associations were approximately linear with log BMI.

As in the study by Stone and colleagues (1), the Bayesian information criterion (13) score was used to select the best-fitting model (not present in the table). Given the multiple factors being fitted in models, we took *P* = 0.005 as our nominal threshold for statistical inference.

To quantify the amount of variance explained by the questionnaire-measured variables, all the mammographic density measurements and questionnaire-measured variables were standardized by the formula: , where is the mean and SD is the standard deviation. Consequently, for each estimated regression coefficient, *β**, (*β**)^{2} approximates the amount of variance explained by fitting that variable with all other variables in the model held constant.

To estimate the correlation between pairs of relatives and to fit a variance components model, we applied multivariate Gaussian regression using the software FISHER with inference based on asymptotic likelihood theory (14, 15). This approach assumes that, after adjusting the mean for specified measured variables, the family residuals follow a multivariate normal distribution with a covariance structure that can be parameterized. It allows estimation of correlations separately for MZ and DZ twin pairs, or for pairs of non-twin sisters (including a twin and her sister), and statistical comparisons.

Under the assumptions of the classic twin model, we also fitted models estimating independent genetic and environmental components of variance that represent additive genetic factors (A), environment factors shared by twins and sisters (C), and individual specific environmental factors and measurement error (E), where A + C + E = total residual variance (V). MZ pairs share all their genes whereas DZ pairs and sister pairs share on average half their genes, so the correlation in additive genetic factors is 1.0 for MZ pairs and 0.5 for DZ and sister pairs (16). Under the assumption that the effects of environmental factors shared by twins and sisters are independent of zygosity and the same for twins and sisters, the correlation between a pair will be (2*ϕ*_{ij}A + C)/V where 2*ϕ*_{ij} = 1 if MZ else 0.5.

## Results

Table 1 shows the characteristics of the 3,324 participants (544 MZ and 399 DZ twin pairs and 1,558 of their non-twin sisters) based on the questionnaire and their mammographic density measures. There was no evidence that, after adjusting for covariates, the means or proportions differed depending on whether the woman was an MZ or DZ twin, or a non-twin. The absolute within-pair difference in age or time between mammograms was 1.34 years for MZ and 1.67 years for DZ twin pairs, and there was no significant difference between MZ and DZ pairs (all *P* > 0.05).

Table 2 shows that 22% of the families had 1 member, 54% had 2 members, 18% had 3 members, 5% had 4 members, and the remainder had 5, 6, or 7 members. The majority of families (57%) contained one twin pair. There were a total 1,483 sister–sister pairings (including sister–twin pairings) that were not independent within families.

Table 3 shows that, univariately, cube root dense area was negatively associated with age at mammogram (6.8%), log BMI (1.7%), age at menopause (3.6%), number of live births [2.9%; the percentage of variance explained by each factor, (*β**)^{2} is shown in brackets]. When fitted concurrently, the associations with age at mammogram, BMI, age at menopause, and number of live births remained nominally significant but, given that these factors were correlated with one another, the percentages of variance explained was approximately halved to 4.0%, 1.0%, 1.0%, and 1.0%, respectively.

After adjusting for the above negative associations, cube root dense DA was positively associated with current use of HRT, years of alcohol consumption, having a benign breast lump removed and having DCIS, and negatively associated with years of oral contraceptive use, explaining 0.3%, 0.4%, 0.5%, 0.3% and 0.4% of variance, respectively. Overall, these measured factors explained about 9% of total variance.

Table 4 shows that, univariately, square root percent dense area was negatively associated with age at mammogram (12.3%), log BMI (16.8%), age at menopause (6.8%), and number of live births (4.0%). When fitted concurrently, the associations with age at mammogram, BMI, age at menopause, and number of live births remained nominally significant and the percentages of variance explained reduced to 7.3%, 14.4%, 0.8%, and 1.0%, respectively.

After adjusting for the above negative associations, square root percent dense area was positively associated with current use of HRT, years of alcohol consumption, having a benign breast lump removed and having DCIS, and negatively associated with years of oral contraceptive use and current smoking, explaining 0.3%, 0.4%, 0.4%, 0.2%, 0.3%, and 0.2% of variance, respectively. Overall, these measured factors explained about 25% of total variance.

Table 5 shows that, univariately, square root nondense area was positively associated with age at mammogram (8.4%), log BMI (42.2%), age at menopause (4.8%), and number of live births (2.3%). When fitted concurrently, the associations with age at mammogram, BMI, and number of live births remained nominally significant and the percentages of variance explained reduced to 7.3%, 39.7%, and 0.3%, respectively.

After adjusting for the above positive associations, square root nondense area was negatively associated with years of use of HRT and years of alcohol consumption, and positively associated with ever smoking and having ovaries removed, explaining 0.4%, 0.2%, 0.2%, and 0.3% of variance, respectively. Overall these measured factors explained about 48% of total variance.

After adjusting for the factors above, the correlation between dense area residuals and percent dense area residuals was 0.84 and between dense area residuals and nondense area residuals was −0.46. There was no evidence that any of the associations above, or the correlations between residuals, depended on whether the woman was an MZ or DZ twin, or a non-twin.

After adjusting for the factors, the MZ, DZ, and sister pair correlations were 0.59, 0.28, and 0.29 for dense area; 0.57, 0.30, and 0.28 for percent dense area; and 0.56, 0.27, and 0.28 for percent dense area for nondense area, respectively. The SEs were 0.02, 0.04, and 0.03, respectively, for all 3 measures. Clearly, the MZ correlations were greater than the DZ and sister pair correlations (all *P* < 0.001), and the DZ and sister pair correlations were not significantly different from one another.

The estimates for A and C, as a percentage of total residual variance, were: for dense area, 0.56 and 0.01; for percent dense area, 0.52 and 0.04; and for nondense area, 0.64 and −0.06. The correlations between estimates of A and C were −0.80, −0.81, and −0.86, respectively. The SEs of these estimates were all about 0.05, so the estimates of C were not significant. By a *post hoc* power calculation, we had 80% power at the 0.05 level of significance to detect values of C > 0.13. For nondense area, the estimate of A when C is constrained to be nonzero was 0.58.

## Discussion

We found that dense area and percent dense area have the same determinants. The amounts of variance explained by BMI, and to a lesser extent age at mammogram, are substantially less for dense area (4% and 1%, respectively) than for percent dense area (7% and 14%). This is an important issue because the associations with these factors are in the opposite direction to the relationship of these factors to breast cancer risk (especially for age and, for BMI, at least for postmenopausal disease and for postmenopausal women), and more so for percent dense area than dense area. After adjusting dense area and percent dense area for age and BMI, the associations with other risk factors are almost identical and explain about 4% of variance. This is consistent with the fact that, after adjusting for all measured risk factors, the dense area and percent dense area residuals are highly correlated with one another (0.85).

In using mammographic measures to create a breast cancer risk factor, dense area and percent dense area are very similar once adjusted for age and BMI, but percent dense area is more problematic due to its much stronger association with BMI. Each step in calculating percent dense area and adjusting it for BMI and age has the potential to introduce more measurement error.

We also found that nondense area has very similar determinants to dense area and percent dense area but mostly in the opposite direction. The associations with age, and especially BMI, are much greater, explaining 7% and 40% of variance, respectively. After adjusting for measured factors, dense area and nondense area are substantially, although negatively, correlated. This is interesting because a linear combination, *F*, of dense area and nondense area has been found to predict breast cancer risk, with the associations in opposite directions. There is an intrinsic collinearity between dense area and nondense area, whose sum is constrained to be equal to the total breast area, especially after adjusting for age and BMI. The factor F above could be representing a single phenomenon that is more common in what is considered by the observer to be dense area and therefore less common in what is considered to be nondense area.

After adjusting for measured factors, we then considered the roles of unmeasured factors in explaining the residual mammographic density measures. By studying MZ twin pairs, we were able to estimate the maximum amount of residual variance due to familial factors, and found this was almost 60%. By studying DZ and sister pairs, we were able to test whether the familial sources of variance were independent of genetic similarity and were able to reject the null hypothesis. Note that this does not prove that a difference in correlation by zygosity is only due to the differences in shared genes by zygosity, as the MZ pairs could have shared nongenetic factors to a greater degree. In this regard, we found no evidence that DZ and sister pair correlations differed from each other, implying that the degree to which these 2 types of first-degree relatives share nongenetic factors relevant to the MD measures is not (substantially) different.

One can always find a nongenetic explanation for familial correlations, and, in this case, it would be that the MZ pairs share such factors twice as strongly as do DZ and non-twin sister pairs. But this 2-fold difference is also highly consistent with the theoretical model first proposed by Fisher in 1918 (16), which predicted that this pattern would be observed if the reason why the relatives were correlated was solely due to the presence of “additive” genetic factors.

Applying the classic twin model to our data, we predicted that about 50% to 60% of residual variance would be due to genetic factors. The remainder would be due to unmeasured individual specific nonfamilial (and therefore nongenetic) factors. The latter would include measurement error, which for these mammographic measures is not large and about 5%; for example, a UK study found the repeatability was 0.94 for dense area, 0.91 for percent dense area, and 0.96 for nondense area (1). The former would include variants in and around genes such as *LSP1* (4, 17), *ZNF365* (18), which have been found to be associated with both dense area and percent dense area adjusted for risk factors and with breast cancer risk itself. These recently discovered variants, however, explain in the order of about 1% or less of the residual variance.

In terms of the mammographic density measures themselves (dense area, percent dense area, and nondense area), the likely genetic component of variance is much greater for dense area due to the fact that far less variance is explained by measured factors, mostly BMI and age. But in terms of the mammographic density measures that predict breast cancer risk, the genetic variances are almost identical.

The fact that the breakdown of residual variance was so similar for dense area and nondense area is not surprising, given their high correlation. But the fact that the same applied to dense area and nondense area is intriguing and supports the notion that—in terms of predicting breast cancer risk—dense area and nondense area (after adjusting for age, BMI, and other breast cancer risk factors) are “two sides of the same coin”; see discussion about factor F above.

The statistical analysis approach we used is optimal in that it provides asymptotically unbiased estimates without subdividing the data into pairs and uses all the information in the all the families, including isolated individuals. This is the strength of the likelihood approach, which produces estimates of standard errors that take into account the fact that the pairings within a family are not independent (19–21). Therefore, information on the correlation between sister pairs comes from sibling pairs in which one was a twin and the other not, as well as from pairs of non-twin sisters. Information on the means (main effects) comes from all women in the data set. Comparison with data from the population-based sample of unaffected women in the ABCFS of the same age did not reveal any major differences in the general characteristics of the participants in this study. As is the case for all studies, it is difficult to exclude the possibility that participants in this twin and sister study are different from the general population in terms of lifestyle factors such as smoking, alcohol consumption, etc. However, those factors are not, or are at most weakly, associated with the mammographic density measures that predict breast cancer risk. They therefore explain at most a very small proportion of variation in these risk-predicting measures, the topic of interest for this paper.

This study supports ongoing research to discover the genetic and environmental determinants of mammographic density. The mammographic density measures that predict breast cancer risk (i.e., adjusted for age and covariates) are highly stable with age/time. The correlations are more than 0.8 for measures even 10 years apart (22). Therefore, these familial associations are likely established at a young age, and we are currently studying the mammographic density measures of younger adult women and their relatives, including their mothers, to gain greater insights into the genetic and environmental determinants of the mammographic density measures that predict breast cancer.

The quest to find more genetic variants associated with mammographic density measures that predict breast cancer risk is ongoing, with 2 major international consortia MODE and DENSNPS (17, 18). Two approaches are being applied. The first involves testing if the common variants being found to be associated with breast cancer risk are also associated with mammographic density measures that predict breast cancer risk. The second involves conducting genome-wide association studies of the mammographic density measures themselves.

The other major challenge is to find the nonfamilial (which implies nongenetic) factors, other than the established breast cancer risk factors measured here by questionnaire, that explain the substantial remainder of variation. This could involve new thinking about breast cancer risk as we have found that the factors measured by conventional questionnaires usually administered in mid-life explain little variance. Issues to be considered could include epigenetics and measures of early-life environment and growth using cohorts that collected relevant measures in the past.

## Disclosure of Potential Conflicts of Interest

No potential conflicts of interest were disclosed.

## Authors' Contributions

**Conception and design:** T.L. Nguyen, M.C. Southey, G.G. Giles, J.L. Hopper

**Development of methodology:** T.L. Nguyen, M.C. Southey, J.L. Hopper

**Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.):** T.L. Nguyen, G.S. Dite, J. Stone, C. Apicella, F. Odefrey, J.N. Cawson, S.A. Treloar, M.C. Southey, J.L. Hopper

**Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis):** T.L. Nguyen, D.F. Schmidt, E. Makalic, G.S. Dite, M. Bui, R.J. MacInnis, J.L. Hopper

**Writing, review, and/or revision of the manuscript:** T.L. Nguyen, D.F. Schmidt, E. Makalic, G.S. Dite, J. Stone, C. Apicella, R.J. MacInnis, S.A. Treloar, M.C. Southey, G.G. Giles, J.L. Hopper

**Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases):** G.S. Dite, J. Stone, C. Apicella, M.C. Southey, J.L. Hopper

**Study supervision:** C. Apicella, J.N. Cawson, M.C. Southey, J.L. Hopper

**Other:** M.C. Southey acquired grant funding for data collection

## Grant Support

This study was supported by the National Health and Medical Research Council (NHMRC) of Australia, the National Breast Cancer Foundation, the Victorian Breast Cancer Research Consortium (VBCRC), Cancer Australia, the Victorian Health Promotion Foundation and the NSW Cancer Council, the Cooperative Research Centre for Discovery of Genes for Common Human Diseases, Australia (1997–2004) under the Australian Government's Cooperative Research Centres program, and Cerylid Biosciences Ltd. J.L. Hopper is an NHMRC Senior Principal Research Fellow, and M. Southey is an NHMRC Senior Research Fellow.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked *advertisement* in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

## Acknowledgments

The authors thank the twins and sisters who participated in this study and the Australian Twin Registry. They also thank Avis McPhee and all the volunteers from the Breast Cancer Network Australia.

- Received May 7, 2013.
- Revision received September 17, 2013.
- Accepted October 8, 2013.

- ©2013 American Association for Cancer Research.