Abstract
Breast cancer risk models increasingly are including mammographic density (MD) and polygenic risk scores (PRS) to improve identification of higher-risk women who may benefit from genetic screening, earlier and supplemental breast screening, chemoprevention, and other targeted interventions. Here, we present additional considerations for improved clinical use of risk prediction models with MD, PRS, and questionnaire-based risk factors. These considerations include whether changing risk factor patterns, including MD, can improve risk prediction and management, and whether PRS could help inform breast cancer screening without MD measures and prior to the age at initiation of population-based mammography. We further argue that it may be time to reconsider issues around breast cancer risk models that may warrant a more comprehensive head-to-head comparison with other methods for risk factor assessment and risk prediction, including emerging artificial intelligence methods. With the increasing recognition of limitations of any single mathematical model, no matter how simplified, we are at an important juncture for consideration of these different approaches for improved risk stratification in geographically and ethnically diverse populations.
See related article by Rosner et al., p. 600
Introduction
Clinical breast cancer risk models are increasingly used to identify women who may benefit from genetic screening, earlier and supplemental breast screening, as well as chemoprevention and other targeted interventions. While inclusion of more factors will generally improve model performance, it is critical to take a step back and consider the benefits of simplifying models, particularly for increasing their clinical utility and implementation (1). In this issue of Cancer Epidemiology, Biomarkers & Prevention, Rosner and colleagues (2) address ways of simplifying a breast cancer risk model using a smaller set of classic risk factors (referred to as QS for questionnaire score), mammographic density (MD), and polygenic risk score (PRS). Using two separate cohorts of women, ages 45–74 years, they found that the model inclusive of age and the three constructs (QS, MD, and PRS) achieves the highest area under the receiver operator curve of 0.687. Here, we discuss additional considerations for model simplification to enhance clinical utility, including new methods that go beyond conventional model building and offer promising opportunities for simplified approaches to risk stratification.
Questionnaire-Based Risk Score
We commend Rosner and colleagues for simplification of risk prediction through selection of a smaller set of questionnaire-based risk factors, including factors amenable to change with potential for altering risk. One key area for improving the clinical utility of risk prediction models is to provide women with information on how reducing risk through modifying risk factors can change their predicted absolute risk for breast cancer (3). However, similar to the study by Rosner and colleagues, most studies only incorporate baseline risk factors in model development and validation. A useful approach for addressing this limitation is that taken by the iPrevent (4) risk management tool, which applies previously reported RR estimates to changing risk factors after the baseline risk is calculated using a family pedigree model, allowing for useful risk reduction information to women. In doing so, this approach eliminates the need for continually updating and validating risk prediction models, as well as the need to wait for prospective model validation. An important area for research is, therefore, how best to use changing risk factor patterns for risk stratification, within the model, or applied after the baseline risks are calculated.
MD
MD, one of the strongest risk factors for breast cancer, is increasingly being integrated into risk prediction models, including the Breast Cancer Surveillance Consortium (BCSC) and the Tyrer Cuzick (TC) models. Rosner and colleagues used the semiautomated method using Cumulus software, and the breast imaging reporting and data system (BI-RADS) density classification, observing similar overall results, but lower AUCs for BI-RADS than Cumulus MD measures. A reliable method for continuous measure of MD that is widely used in research, Cumulus method, has limited clinical utility due to the labor-intensive nature of the assessment (5). The BI-RADS MD measures are available for clinical mammograms in the United States, and have been incorporated into the BCSC model, but this four-category classification has considerable within- and between-reader variability, as well as some variability over time due to changes in classification guidelines and in response to the breast density notification laws (5–7). The lack of a standardized and easily implemented MD assessment continues to be a major impediment to widespread application of MD in clinical settings, but is the subject of robust research and technological development. Recent development of fully-automated methods to produce continuous and reliable MD measurement, and notable efforts to elucidate risk-relevant mammographic information that go beyond conventional MD measures (e.g., degree of brightness or spatial distribution of mammographic features), promise to offer more readily feasible and meaningful integration of MD in clinical risk stratification (8–10).
Most research to date has considered a single measure of MD in risk prediction models. This approach was also used by Rosner and colleagues who cited previously reported high within-women correlation to suggest adequacy of a one-time MD assessment for risk prediction (11). Although high correlations over time mean that multiple measures may not improve the model discrimination, model calibration may be affected by changes in MD over time. In addition, the high correlation does not preclude the need to use updated MD measures that reflect or even quantify amount and patterns of MD changes over time. There is at least some evidence to suggest that changes in MD, specifically lower rate of decline or increase in MD, may be associated with higher risk, particularly for younger women (12–15). Thus, the role of repeat MD and/or changes in MD over time for improving clinical risk prediction merits further investigations.
PRS
PRSs are based on a weighted sum of genetic variants related to risk from genome-wide association studies (GWAS) where the weights are proportional to the strength of the associations between individual SNPs and the outcome. Currently, the breast cancer PRS has 313 genetic variants, and studies support the independent association between PRSs after considering other breast cancer risk factors, including family history (16). Rosner and colleagues added important evidence to the growing evidence base that PRSs are independent of MD and QS and can improve risk prediction. Additional genetic variants predictive of MD may soon make it possible to evaluate the implementation of PRSs for predicting risk and informing breast cancer screening even if mammograms are unavailable as may be the case for younger women (17). Specifically, large cohorts of young women with PRSs and repeated mammograms would be required to systematically and comprehensively test whether PRSs could be used to help inform initiation and frequency of screening prior to the age of onset of population-based mammography.
Is It Time to Move Beyond Predictive Models?
Risk prediction models of diseases with long induction times, like breast cancer, require retrospective collection of life-course risk factors and prospective validation with at least 10 years of follow-up. Most risk prediction models have been developed and tested in select populations, particularly among postmenopausal women of European ancestry; such is also the case for majority of studies of MD and PRS. Alternative approaches to conventional mathematical modeling and to capturing the essential risk-salient information contained within mammographic images and genetic data may provide new avenues for risk stratification. Specifically, artificial intelligence (AI) techniques, such as deep learning (DL) programs, have recently been used to determine mammogram-based breast cancer risk measures, taking an agnostic approach without attempts to replicate MD or specific mammographic features (18–20). Growing data suggest that these DL risk measures show stronger associations with breast cancer than MD measures, and perform better than widely used risk prediction models. For example, Yala and colleagues developed, validated, and tested a DL risk model with an AUC of 0.68, which was further improved to 0.70 with inclusion of risk factor data, and performed similarly in pre- and postmenopausal women and in women with and without family history of breast cancer (18). Importantly, this study found similarly high discrimination in white and black women for DL risk measure, while reporting significantly lower AUC for TC model and risk factor models only, particularly among black women. Furthermore, it may be possible to consider AI-based methods for GWAS data, which may improve on the risk stratification over a PRS. For example, the regression-based PRS assumes a linear, independent, and additive contributions of individual SNPs. Because AI-based methods are nonparametric in nature, they do not limit themselves to linear additive and independent associations. Badré and colleagues reported that, by capturing gene–gene interactions, DL-based PRSs can improve AUC from 0.64 to 0.67, in comparison with regression-based PRS (21). Behravan and colleagues also reported that, by allowing nonparametric interactions between genetic and demographic factors, the AI models improved the mean average precision in breast cancer prediction from 0.74 to 0.78, comparing with the ones using SNPs alone (22). Despite promising data regarding AI-based methods, caution is warranted as the contribution of epistatic effects over and beyond additive effects may be small for complex traits, like breast cancer (23).
It is tempting to envision a world where AI-based methods may be used to develop and validate risk models in an accelerated timeline and with a more inclusive population; however, key questions require empirical confirmation before these methods can be clinically useful. These include the applicability of DL risk measures across multiple institutions and mammogram vendors, tracking of DL mammogram–based risk measures over time or with structural changes in the breast (e.g., changes due to parity, lactation, and menopause), and performance for longer time horizons that would support clinical decision-making around risk reduction and screening. Furthermore, data in women at familial and genetic risk for breast cancer are limited.
Finally, in thinking about the strengths and limitations of the various approaches, it is important to keep the focus on improving ways of identifying women at highest risk for worse outcomes, which means identifying women at higher risk of early onset and biologically aggressive breast cancer, and findings ways that this identification can lead to fewer advanced-stage breast cancers at detection/diagnosis. Such focus along with a simplified approach may prove to use less, but offer more and even better risk reduction.
Authors' Disclosures
No disclosures were reported.
Footnotes
Cancer Epidemiol Biomarkers Prev 2021;30:587–9
- Received November 16, 2020.
- Revision received January 12, 2021.
- Accepted January 14, 2021.
- Published first April 2, 2021.
- ©2021 American Association for Cancer Research.