## Abstract

Background: Mammographic breast density (MBD) has a strong genetic component. Investigating the genetic models for mammographic density may provide further insights into the genetic factors affecting breast cancer risk.

Purpose: To evaluate the familial aggregation of MBD and investigate the genetic models of susceptibility.

Methods: We used data on 746 women from 305 families participating in the Sisters in Breast Screening study. Retrieved mammograms were digitized, and percent mammographic density was determined using the Cumulus software. Linear regression analysis was done to identify the factors that are associated with mammographic density and a multivariate regression model was constructed. Familial correlations between relative pairs were calculated using the residuals from these models. Genetic models of susceptibility were investigated using segregation analysis.

Results: After adjusting for covariates, the intraclass correlation coefficient among the residuals was 0.26 (95% confidence interval, 0.16-0.36) in sister-sister pairs and 0.67 (0.27-1.00) among the monozygotic twin pairs. The most parsimonious model was a Mendelian single major gene model in which an allele with population frequency 0.39 (95% confidence interval, 0.33-0.46) influenced mammographic density in an additive fashion. This model explained 66% of the residual variance.

Conclusion: These results confirm that MBD has a strong heritable basis, and suggest that major genes may explain some of the familial aggregation. These results may have implications for the search of genes that control mammographic density. (Cancer Epidemiol Biomarkers Prev 2009;18(4):1277–84)

- mammographic density
- genetic model
- breast cancer
- epidemiology
- modelling

## Introduction

Several epidemiologic studies have shown that mammographic breast density (MBD) is a strong risk factor for breast cancer. The risk of breast cancer associated with mammographic density has been reported to be two to six times greater in the highest density category compared with the lowest density category (1-4). Similar results have been found in studies based on visual assessment of density by radiologists, and studies that used semiautomated computer methods, which can provide a quantitative estimate of MBD (2, 5, 6).

A number of factors are known to influence mammographic density. These include age, body size, menopausal status, number of live births, use of hormone replacement therapy (HRT), history of benign breast disease, smoking, and alcohol intake (7-12). In addition, results of twin studies indicate that mammographic density has a strong genetic component (13, 14). In a study of 353 monozygotic and 246 dizygotic twin pairs, Boyd et al. (13) found a correlation in of 0.63 in monozygotic twins and 0.25 in dizygotic twin pairs. They estimated that genetic factors account for 63% of the residual variation in mammographic density after adjusting for other measured factors. Another analysis from the same investigators reported that genetic factors also explain a large proportion of the variation in the mammographic areas of both dense and nondense tissue, and that genetic variances of these components were largely independent (14). Although several genes have been suggested to be associated with mammographic density, no associations have been definitively established. In particular, MBD does not seem to be associated with presence of a *BRCA1* or *BRCA2* mutation, and there is no evidence for an association with any of the common single nucleotide polymorphisms recently shown to be associated with breast cancer (15-18).

Although the evidence for a genetic component to MBD is strong, the genetic models underlying MBD are less clear. Pankow et al. (19) investigated the genetic models underlying mammographic density using data on 258 families in which female member had mammographic density assessed by a 5% increment visual evaluation. A Mendelian dominant model showed the best fit, but other major gene models (additive, recessive model) also fitted the data well. The authors did not investigate polygenic or mixed (major gene plus polygenic) models of inheritance.

In the present study, we used families identified through the breast screening program in the United Kingdom, and in which mammographic density has been measured, to examine the familial aggregation of MBD and to investigate the genetic models of susceptibility, including major gene, polygenic, and mixed models.

## Materials and Methods

### Study Population

We used data from the Sisters in Breast Screening study, an ongoing study designed to map genes associated with breast density. Families were identified through the National Health Service breast screening program in the United Kingdom. Eligibility was restricted to families in which two or more female blood relatives (sisters, half sisters, first cousins, or aunt-niece) had had mammographic screening. Families whose member could have screening within 2 y of the recruitment were also included. The study was approved by the local research ethical committee.

Information leaflets explaining the rational of the study were given to women attending the Cambridge and Huntingdon Breast Screening Service. Women who were willing to participate sent completed forms to the study coordinating center and were then contacted by a study coordinator. Families were also ascertained through newspaper and radio advertisements. Study participants were sent a letter, blood kit, and questionnaire covering information on family information, reproductive and menstrual history, oral contraceptive use, HRT, life-style factors, and medical history including benign breast disease and cancer history. Twin status was verified by a study coordinator when participants were recruited. When the zygosity was not clear, a further questionnaire was administered to clarify this. Current height and weight were measured at general practices, and blood samples were collected by practice nurses or phlebotomists. Study recruitment commenced in October 2002. The current analysis was limited to families whose data including mammographic density measurements were completed by July 2007.

### MBD Estimation

For each participant, all available mammograms were retrieved from the local screening unit and were digitized. The mammograms were scanned using the Array 2905 Laser Film Digitizer and the program DICOM ScanPro Plus Version 1.3E (Array Corp), with 50-μm pixel resolution and 12-bit digitization, and an absorbance of 4.7.

For each individual, we aimed to collect the earliest and most recently available mammograms. Mammographic density was measured using the Cumulus 3.0 program (Martin Yaffe and Norman Boyd). All mediolateral oblique views (left and right) were assessed by one specially trained research assistant as a main reader (JB). Digitized images were displayed on a computer screen and thresholds were set to define the edge of the breast and the edge of the dense area. Cumulus determined the number of pixels for the total breast area and for the dense area of the tissue. This method is widely used and provides a fully quantitative measure of density (20, 21). The analyses presented here were based on percent mammographic density (%MBD), computed as the dense area as a percentage of the total breast area. Mammograms were analyzed in a random order and the reader was blinded to the sequence of the mammograms and to the visual density evaluation that had been done by another reader. All readings were made independently for right and left breast. Intrareader repeatability was assessed by rereading 10% of the mammograms. Interreader reproducibility was examined by comparing 10% of the first readers' result with those by a radiologist (RW). Mammograms obtained at or after a breast cancer diagnosis were excluded from the analyses (a total of 15 women were excluded).

### Statistical Analysis

Correlations between the left and right percent density readings were assessed using Pearson's correlation coefficient. All of the subsequent analyses were based on the mean percent density between the left and right breast readings. The square root transformation of the mean percent density provided the best approximation to the normal distribution and was used in all the regression analyses presented.

Linear regression analysis was used to examine the associations between mammographic density and various covariates. All associations were initially examined in univariate models. Variables that showed associations with *P* values of <0.05 were then investigated using a multiple regression analysis model. A final regression model was constructed by including the covariates that remained significant in the multiple regression analysis after performing forward and backward selection.

Our analyses included mammograms from the same individual at two time points and these could not be considered independent. To allow for the repeated measurements on the same individual, we used the generalized estimating equations approach of Liang and Zeger (22), to specify a working correlation matrix. For our purposes, we used the Huber-White sandwich estimator of the variance (23), by treating all the observations for an individual as independent clusters. This approach generates consistent estimates and valid SEs (24).

Variables examined in the linear regression models included the age at mammographic screening, current body mass index (BMI), waist-hip ratio, smoking history, alcohol consumption, age at menarche, age at menopause, menopausal status, and type of menopause (natural, hysterectomy and/or oopholectomy, medication and treatment), parity, age at first pregnancy, number of live births, breast feeding, oral contraceptive use, HRT use, history of benign breast disease, and medication (including Tamoxifen). Age at mammographic screening, menopausal status, and HRT use were variables that corresponded to each mammogram (obtained per two time points). These time varying covariates were coded with respect to the status at the age at each mammogram. The remains of the variables were obtained once at the time of questionnaire and coded as a single value per person. Age at mammographic screening, BMI, and waist to hip ratio were coded as continuous variables, whereas other covariates were coded as categorical variables.

The final model was used to obtain the residual of the square root mean %MBD, after adjusting for covariates, at each mammogram (i.e., each time point). If an individual had mammograms at two time points, we computed the mean residual between the two time points. The mean residuals were used in the computation of the familial correlations and segregation analyses. Summary statistics and regression analyses were conducted using Stata 8.0 (StataCorp).

Familial correlations between pairs of relatives were computed using the intraclass correlation coefficients between the mean residuals. We computed correlation coefficients for sister pairs, monozygotic twin pairs, and cousin pairs. The intraclass correlations were computed using one-way ANOVA in Stata 8.0.

### Segregation Analyses

To investigate the genetic models underlying mammographic density, we used the family data of the study participants to carry out segregation analysis of the mean residuals. The models were implemented in the pedigree analysis program MENDEL (25).

We investigated four single-gene models [dominant, general (codominant), recessive, additive], a polygenic models and a mixed model that incorporates a single gene component and a polygenic component. The polygenic models hypothesize the effect of large number of genes, of which each of the genes has a small effect. We assumed that the residual square root density *Y _{i}* for each individual follows the following model:.

Where *μ _{gi}* is the mean for an individual with major genotype

*g*is the polygenotype of individual

_{i}, P_{i}*i*, which is normally distributed with mean 0 and variance σ

^{2}

_{p}and ε

_{i}∼

*N*(0, σ

^{2}). Models were parameterized in terms of the genotypic means, polygenic and residual variances, the major locus allele frequencies, and the transmission probabilities. The Mendelian

*single gene*model assumed that inheritance was due to a single autosomal locus with two alleles. Under these models the transmission probabilities of susceptibility alleles were constrained to those expected under Mendelian inheritance. We tested for departure from Mendelian inheritance by fitting models where the transmission probabilities were considered to be parameters that were estimated and compared those against the Mendelian models. We also fitted a model that assumed three different categories (types) in the population, where the parental types are independent of the offspring types and the prevalences remain constant between generations (the so-called “Environmental model”). Polygenic inheritance was approximated using the hypergeometric polygenic model using three loci (26, 27). Segregation analysis is generally underpowered in distinguishing between the effects of a single major gene and the heterogeneous effects of several (major) genes, especially in the absence of segregation data over several generations, which is the case in the present data set. Therefore, we did not fit models that included the effects of more than one major gene.

Nested models were compared using likelihood ratio tests. The fit of nonnested models were assessed using the Akaike Information Criterion (AIC; ref. 28), where AIC = −2 ln(likelihood) + 2 (number of independent parameters). The model that minimized AIC was regarded as the most parsimonious model.

## Results

The characteristics of the women included in the analysis are summarized in Table 1 . In total, 746 women from 305 families were eligible for inclusion in the analysis. The mean age at questionnaire completion was 59.8 years (range, 37-79 years). Six hundred thirty three women had mammograms from two time points. The average age at first mammogram was 52.8 years (including those with a single mammogram) and that of the second mammogram was 60.6 years. Eighty-eight percent of the participants were postmenopausal at the time of questionnaire completion and 12% of the women were current users of HRT medication.

Ten percent of the mammograms were rescored by the main reader (JB); the estimated correlation between the two readings was 0.84. The correlation between the two readers (JB, RW) in this study was 0.88, identical to the correlation estimated for these readers in a previous study (16).

The correlation in the %MBD between left and right breasts was 0.87. For the subsequent analysis, we used the mean %MBD over the left and right breasts. The mean %MBD was 24.9% (SD, 16.7%) at the first mammogram and 16.6% (SD, 13.5%) at the second mammogram.

### Multiple Regression Analysis

The results of the univariate linear regression analysis using the square root transform of the mean %MBD between left and right side are shown in Table 2 . Age at mammogram, BMI, waist to hip ratio, menopausal status (and type), HRT use, oral contraceptive use, history of benign breast disease, alcohol consumption (>5 unit) parity, total months of breast feeding, and age at first live birth were all found to be associated with %MBD at the 5% significance level. To avoid collinearity problems in the multiple regression, where covariates describe a similar risk factor effects (e.g., menopausal status and type of menopause), we chose to include the covariate coding that provided the highest R square in the multiple regression analysis.

The results of multiple regression analysis are shown in Table 3
. The use of oral contraceptives was removed by both forward and backward selection based on a take-up/removal criterion of *P* value of 0.10. The following variables remained in the model as significant terms in the model: age at mammogram, current BMI, waist to hip ratio, menopausal status at date of mammogram, HRT use at date of mammogram, history of breast benign disease, and number of live births. These covariates in total explained 26.7% of all the variation of square root transformed percent density.

### Familial Correlations

The 746 women with available mammograms came from 305 distinct families. Among those, five women came from five families in which no other family member had a mammographic measurement eligible for the analysis (due to breast cancer diagnosis before the mammography). Among the remaining 300 families, 256 were part of sister sibships, there were 8 monozygotic twin pairs, and 89 cousin pairs. The intraclass correlation coefficient among sister-sister pairs was estimated to be 0.26 [95% confidence interval (CI), 0.16-0.36]. Among the monozygotic twin pairs, this was estimated to be 0.67 (0.27-1.00). Among the cousin pairs, this was estimated to be 0.17 (0.00-0.37). Mother-daughter pair correlation was not examined because it was not the original intention of this study to collect mother-daughter pairs and also because of small sample size.

### Segregation Analyses

The residual %MBD had mean value of 0.016, SD of 1.42, skewness of 0.13, and kurtosis of 2.53. Residuals of the square root transformed percent density were computed based on this model, and the residual values from the first and second mammograms were averaged for each individual. These values were used for the familial correlations and segregation analyses.

The results of the segregation analysis using the residual %MBD were summarized in Table 3.

To investigate the most parsimonious model, we used a sequential testing strategy. Compared with the environmental model, the general non-Mendelian model provided significantly better fit (*P* = 7 × 10^{−8}) and “environmental” transmission was therefore rejected. The transmission parameters under the non-Mendelian general model converged to values similar to those expected under Mendelian inheritance, and the non-Mendelian general model did not fit better than the Mendelian general single model (*P* = 0.33).

Among the Mendelian single gene models, the general model fitted significantly better than the dominant and recessive models of inheritance (*P* = 0.006 and 0.002, respectively). The genotypic mean estimates under the general model were very similar to the mean estimates under the additive model, which had the lowest AIC (1,208.4) among the Mendelian single gene models. There was no significant difference between the general single and additive models (*P* = 0.27).

The log-likelihood of the pure polygenic model was lower than those of the Mendelian models and had a higher AIC compared with all the Mendelian models, although the differences were small compared with the dominant and recessive models. The inclusion of the polygenic component did not improve significantly the fit of the dominant, recessive, general, or additive models (*P* = 0.09, 0.06, 0.9, and 1.00), respectively.

Under the most parsimonious additive model, the frequency of the susceptibility allele was estimated to be 0.39 (95% CI, 0.33-0.46). The genotypic mean values were estimated as 2.05 (95% CI, 1.74-2.37), 0.38 (95% CI, 0.12-0.64), −1.30 (95% CI, −1.51 to −1.09) for homozygote carriers of the susceptibility allele, heterozygotes, and wild-type homozygotes, respectively. Using the multivariate regression equation to transform these into percent density values and assuming the mean values for the continuous variables (age at mammogram, BMI, waist to hip ratio) and the mode (most frequent value) for the categorical variables, the corresponding mean percent density values under this model were 35.7% for genotype AA, 18.5% for genotype Aa, and 6.9% for genotype aa. Based on the estimated residual variance under the additive model (0.68) and the overall variance of the residuals after adjusting for the known covariates (1.99), the additive model is estimated to account for 66% of the variance in transformed %MBD after adjusting for known risk factors (95% CI, 56-74%, based on the 95% CI of the residual SD under the additive model). We note that the estimated residual variance was similar under all genetic models fitted, except the dominant and recessive models. Because known covariates explained 27% of variance in %MBD, the genetic factors are estimated to account for 48% of the total variance.

To investigate whether the inclusion of mammograms taken long time before the BMI measurement had any influence on our results, we repeated all the analyses by restricting to mammograms taken within 3 years of the BMI measurement. Six hundred fifty-six individuals (270 families) were included in this analysis. The results were similar to those obtained with full data set (data not shown). After multiple regression, the same covariates remained significant in the model explaining 27.7% of all the variation of the square root transformed percent density, similar to the previous estimates. In addition, the results of a segregation analysis were similar to the results based on the whole data set. The overall the Mendelian single gene additive model was again the most parsimonious model with no significant difference between this and the general single models or the mixed additive model. The only difference was an improvement in the fit of the dominant and recessive models after the inclusion of the polygenic component (*P* = 0.04 and 0.02, respectively). However, compared with the mixed general model, both the mixed dominant and mixed recessive models were rejected (*P* = 0.0008 and 0.0007, respectively).

## Discussion

We used data from the Sisters in Breast Screening study of mammographic screening to investigate the genetic models of susceptibility for mammographic density. Strengths of our analysis include the availability of mammograms from two different time points and use of computer-based measurements of density with a trained operator, thus minimizing the measurement error. There are some limitations in the covariate data. BMI, one of the key covariates to associate with mammographic density, was measured at the time of entry into the study, on average 8 years after the mammogram. When we restricted the analyses to mammograms taken within 3 years of the BMI measurement, the conclusions were identical. Other covariates were only available from questionnaire and thus may be liable to a recall error. A further limitation of the study is that only a small number of mammographic readings were available on family members from different generations. This may have an effect on the power of the segregation analysis to discriminate between some of the models. In particular, this may have affected the comparisons of the mixed dominant and recessive models and corresponding single gene models. However, none of these models provided the best fit to the data. The most parsimonious model was a single gene additive model. In our analyses, we were able to distinguish between genetic and nongenetic models, Mendelian and non-Mendelian transmission models, and between the mixed general and single gene general models. We were also able to distinguish between the various single gene models.

Our results confirm strong familial correlations in mammographic density. Our estimate of the intraclass correlation for sister pairs of 0.26 is in line with previous reports. Pankow et al. (19) calculated the sister-pair correlations of breast density to be 0.22 (age adjusted). Boyd et al. (13) calculated correlation coefficient as 0.27 for dizygotic pairs in their twin study. Although the sample size is small, our estimate of the intraclass correlation among monozygotic twin pairs was higher than that among sister pairs (0.67 versus 0.26), and our estimate for monozygotic twin pairs was also consistent with the estimate of Boyd et al. (estimated correlation 0.63; ref. 13). Unfortunately, there were not a sufficient number of mother-daughter pairs to conduct meaningful analysis. As recruitment for the Sisters in Breast Screening study is currently ongoing, it may be possible to conduct further analysis with a larger sample size in the future.

In the segregation analysis, a single additive gene model provided the best fit to the data. The addition of the polygenic component did not improve the fit of the major gene models significantly. This model explains 66% of the observed variability of residuals (after adjusting for known covariates) and 48% of the total observed variability in mammographic density. Our estimate for the contribution of genetic factors to the observed variability of density is somewhat higher than the 29% estimate of Pankow et al. (19), and somewhat lower than the 60% of Boyd et al. (13). Other unmeasured effects such as diet, exercise, and/or life-style factors may account for the remaining unexplained variance.

In their segregation analysis, Pankow et al. (19) used families ascertained through probands diagnosed with breast cancer. They concluded that the Mendelian dominant model showed the best fit, although the recessive model also fitted the data well. Their results are in agreement with ours in that they both support the Mendelian major gene model. Also, in addition to the different ascertainment process, there are other differences between the two studies. Mammographic density readings were done visually and a semiquantitative density measure was derived, whereas in our case, we used a continuous measure of density derived using the CUMULUS software. Their analyses used the regressive models implemented in the SAGE software and adjusted for the covariates simultaneously with the segregation analysis. In our segregation analysis, we used the residuals after adjusting for the covariates that proved to be significantly associated with mammographic density. There are also some differences in the covariates included in the models, for example alcohol consumption, and cigarette smoking, which were not significantly associated with mammographic density in our study.

Some investigators have considered separately the absolute area of dense tissue and nondense tissue rather than their ratio (14, 29). Stone et al. (14) investigated the heritability of absolute area of mammographically dense and nondense tissue in twin pairs and found that the heritability of each was similar to that of %MBD. It may therefore be possible to use the dense and nondense area instead of percent density as a trait in segregation analyses, and these will be subject of future analyses. In addition, it has been suggested that longitudinal change in mammographic density, along with density itself, is related to breast cancer risk (30). Although our analysis was based on the mean residuals from two time points for each individual, changes in density over time could be incorporated in future analysis.

The results of our segregation analysis are perhaps somewhat surprising in that one would expect such a continuous measure to be determined by many genes, particular because related phenotypes such as weight and height are polygenic. Segregation analysis can be affected by the distribution of the quantitative trait and the single gene model would predict a skewed distribution, whereas a polygenic model would predict a symmetrical distribution. The residuals of %MBD in this study reasonably approximate normal distribution and therefore it is unlikely that our results were substantially affected by skewness. Segregation analysis, however, cannot distinguish between the effects of a single gene and the heterogeneous effects of several genes, and the latter model seems more likely. To date, no definite loci for MBD have been identified, although studies have suggested possible associations with variants in *HSD3B1* and insulin-like growth factor I (31, 32) and a recent genome-wide linkage study reported suggestive evidence of linkage in three regions (33). Our analysis provides some optimism that suitably sized genome-wide linkage or association studies will identify some of the loci responsible.

## Disclosure of Potential Conflicts of Interest

M. Kataoka: supported by the Banyu Fellowship Program 05-07. Sponsored by Banyu Life Science Foundation International.

## Acknowledgments

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked *advertisement* in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

We thank all of the sisters and families who participated in the Sisters in Breast Screening (SIBS) study, Cancer research UK for funding this study. Masako Kataoka is supported by funding from Banyu Life Science Foundation International. ACA are funded by Cancer Research UK. DFE is a principal research fellow of Cancer Research-UK.

## Footnotes

- Accepted February 16, 2009.
- Received June 26, 2008.
- Revision received December 4, 2008.