| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Departments of Radiology [S. P., J. A. S., R. S-B.], Medicine [S. R. C., K. K.], Epidemiology and Biostatistics [S. R. C., K. K.], and General Internal Medicine Section, Department of Veterans Affairs [K. K.], University of California, San Francisco, California 94143-1250
| Abstract |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
Breast density can be crudely graded using a subjective scale that takes into account the quantitative (amount of density) and qualitative nature of the density (diffuse or associated with ductal structures; Refs. 1 , 5 , 6 ). Qualitative methods have a limited number of density categories and can detect only very large changes in density. Because of their subjectivity, qualitative methods have substantial intra- and interobserver variation (7) . A more quantitative approach has been used to measure the area of dense breast as a proportion of the total projected breast area, or "mammographic density" (1 , 8 , 9) . Mammographic density is expressed as PD3 defined as PD = (radiographic dense area)/(total breast area) on a scale from 0 to 100% (10) .
Mammographic density is not routinely quantified for research studies because current methods are time intensive, manual, and require expert training. On the other hand, a quantitative measurement appears to be superior to qualitative categorical methods such as Wolfe (5) and the American College of Radiology Breast Imaging and Reporting Data System (BI-RADS; Ref. 11 ). A recent tamoxifen trial measured breast density as a surrogate end point for breast cancer risk and found that the most significant annual changes in breast density were observed with a quantitative measurement (12) .
Although, breast density has been shown to be a power indicator of cancer risk, there is no generalized method for training and validating individuals to perform the measure. We also ask the question: does it require a specialist in mammography to delineate the dense regions in the mammogram? If not, this may make the technique more clinically available by increasing the number of available trainees. In this study, we attempt to train people with a formal education in radiology, other fields of medicine, and those with a nonmedical background to quantify mammographic density using a predefined training program. A secondary goal of the study was to demonstrate the association of quantitative PD measurement with the risk of breast cancer.
| Materials and Methods |
|---|
|
|
|---|
Measurements.
All of the films were initially assessed by a radiologist with training in mammography and density reading (gold standard, R.S-B.). Films were viewed directly using a standard radiology light box. A wax pencil was used to outline the breast area and breast densities. Films, with the wax pencil marks, were digitized on a Lumisys LumiScan 200 radiographic film digitizer (Kodak, Inc.) at a resolution of 200 x 200 µm2 and the PD was determined by measuring the total area of the breast and number of pixels outlined in the dense regions using dedicated computer software. The software is based on the commercially available medical image-processing package MEDx (Version 3.31, Sensor Systems, Sterling, VA). Extensions to this package were written in the open-source scripting language Tcl/Tk (Version 8.3, www.tcltk.com).
Using the gold standard PD measurement, films were stratified into deciles of PD. For every density decile, 10 noncancer and 10 cancer films, when available, were selected to be included in a validation data set, resulting in 60 cancer and 84 noncancer images with PD ranging from 0 to 100%.
The wax pencil marks were erased from the validation set and films were redigitized without marks and patient identifiers. The digitized film files were transferred to CD-ROM for review by study readers.
The reading station program randomized the order of all films and consecutively displayed them with a default brightness/contrast setting on a high-resolution radiographic monitor. The reader was prompted to manually trace the breast contour using a polygonal drawing tool (clicking with the computer mouse inserts a polygon vertex at the cursor). After confirming the breast contour outline was correct, the reader proceeded to outline the dense areas of the breast using a "pencil" tool (the mouse cursor acts like the tip of a pencil). The number of dense regions was not limited and included zero for breasts that appeared to have no dense regions at all. PD was calculated as the ratio of the sum of all dense regions (overlaps are not counted twice) to the entire breast area. PD, and all drawn contours, were then stored in a database linked to the readers study identification number, film identification number, and reading session number. Fig. 1
shows a screenshot of the main program window during an analysis session.
|
All of the readers were trained (by R.S-B.) in an hour-long training session in front of a light box (see Fig. 2
). Mammography examinations ranging from very-low to very-high PD, and uniform to very-structured appearance were presented and discussed. After that, the readers were trained on the PD reading workstation (see Fig. 1
). The readers could take as long as they wanted to read each film.
|
0.9 after the first reading of the 144 selected films, the 10 cases with the biggest absolute difference from the gold standard were identified. The study readers compared their breast and dense tissue outlines with the gold standard outlines for those 10 cases and tried to identify any patterns that could account for the deviations from the gold standard to improve future readings. Scatter and Bland-Altman plots of study reader PD versus gold standard PD were also provided, and regression results were discussed with the study reader. After additional training, the study reader read the same 144 films a second time, again blinded to film identity. If a study readers correlation with the gold standard was not
0.9, the study reader was not considered a certified reader for future studies.
Statistical Analysis.
Statistical analysis was programmed in SAS 8e (The SAS Institute, Cary, NC). Intrareader reproducibility was performed with ANOVA. Interreader agreement was calculated using linear regression and was expressed as Pearson product-moment correlations r. It is conceivable that reader reproducibility is dependent on breast density: "Extreme" cases might be easier to recognize and analyze than films in the midrange of PD values. Therefore, we categorized the average reading results (averaged over the two read passes) into four quarters (PD <25%; 25%
PD < 50%; 50%
PD < 75%; and PD
75%), and we calculated reproducibility separately for each of these four PD categories. For the same reasons as for intrareader reproducibility, we repeated the interreader regression analysis between readers and gold standard by PD category. Overall agreement between readers with similar background was tested with two-factor ANOVA, calculated separately for the reader groups RAD, MD, and NMD.
Odds ratios to determine the association between breast density and breast cancer status were calculated in two ways. First, a fixed threshold of PD = 50% was chosen to discriminate between cancer and noncancer cases, and the odds ratio was calculated from the resulting contingency table. Second, to avoid bias introduced by the arbitrary threshold, we executed unconditional logistic regression analysis with PD as factor and cancer status as outcome. To arrive at meaningful SDs, we calibrated reading results to the gold standard as:
![]() |
| Results |
|---|
|
|
|---|
0.9 compared with the gold standard. Table 1
|
0.9 on the first read, and who, therefore, did not receive additional training between the two readings. Readers RAD2 and RAD3 were not available to us for a second reading. Reproducibility ranged from RMSE = 6% PD to 11% PD and showed this range for all three of the reader groups. By comparison, the gold standard reader achieved RMSE = 6% PD (r = 0.95) on a subset of 100 duplicate readings. When categorized by PD quarters, we found that the reproducibility range was similar for each quarter. We found RSME = 813% PD for the first quarter (PDgold, <25%), RSME = 812% PD for the second, and RSME = 513% PD for the third quarter. We did not have enough data points for meaningful regression analysis of the fourth (PDgold, >75%) quarter.
Table 2
shows generalized ANOVA results for the comparison of overall reader-group performance. The RAD group exhibited highest intraclass correlation, followed by the MD group and the NMDs. The same was true when only validated readers were considered, but because only one NMD was validated, intraclass correlation could not be calculated for this group.
|
|
| Discussion |
|---|
|
|
|---|
In other published studies, inter- and intrareader variability values range from 0.86 to 0.96, similar to the results reported here (13 , 14) . Jong et al. (13) found an overall correlation of 0.89 between two readers and noted that the type of dense tissue distribution (homogeneous, nodular, linear) had a strong influence. We did not see such an effect in this study. Although we did not evaluate reproducibility by type of dense tissue, we retrospectively categorized the films by their density. We did not observe differences in reproducibility between categories. This might be of particular importance for study populations with a skewed or preselected mammographic density distribution.
A limitation of this study is that not all of the readers were available for a second reading. Therefore, we could not present a complete picture of intra- and interreader variability. Although those readers could have exhibited a performance drop during the second read, three of four readers who did finish a second reading maintained or even improved their performance. In addition, as noted above, the intraclass correlations are similar to those reported by others. Our reproducibility analysis did not consider redigitization. Although we do not expect a large effect from digitization because routine digitizer quality assurance showed that the device is stable and linearly maps film absorbance to pixel gray-scale value, this step needs to be verified. Lastly, because we attempted to provide a full range of density values for all of the decades, this most probably inflated the PD variance and, thus, improved the correlation coefficient. However, it is our view that using this approach (stratified PD values in every decade) will allow others to reproduce our results.
Reader Quality Control.
Reader RAD1 in our study had PD readings highly correlated with the gold standard and high odds ratios predicting cancer risk on the first reading. When this reader was retested, he had a markedly lower correlation with the gold standard and slightly different slope and intercept. This suggests that PD readers for research studies may need to be monitored for consistent reading quality. Quantifying reproducibility is highly important because it influences the least significant difference that the breast density measure can detect with confidence. Continuous monitoring may be achieved by including films from the validation set with the study data so that, if necessary, a reader can be retrained, or removed if skills wane over time.
Of note, several readers achieved slightly, albeit nonsignificantly, higher odds ratios than the gold standard reader. This merely reflects that the gold standard itself is subjective but also suggests that the correlation criterion might need to be supplemented with a more objective one such as an odds ratio threshold. Automation would obviously alleviate the subjectivity problem. Efforts in this direction have recently been undertaken by a number of groups (15, 16, 17, 18) but are also based on maximization of the correlation to a human gold standard.
Conclusions.
The goal of this study was to establish whether or not a background in mammography or radiology is necessary to quantify mammographic breast density and to validate readers for future mammographic density studies based on the degree of correlation to a gold standard reader. We found that, although it seems beneficial to have a radiological background, it is not a prerequisite. Of nine study readers, five (all three radiologists, one of two physicians of other disciplines having some radiological background, and one of four nonphysicians) had readings that sufficiently correlated with the gold standard that they could be considered breast density readers for future research studies. All of the readers with breast density readings highly correlated with the gold standard reading achieved odds to predict breast cancer of similar magnitude (on the order of three with PD dichotomized as less than 50% or 50% or greater), which is comparable with values in the literature. Strict validation criteria must be applied to qualify readers for mammographic breast density quantification. For research studies, this will enhance the chance of accurately assessing breast density and discriminating women at high and low risk of breast cancer.
| Acknowledgments |
|---|
| Footnotes |
|---|
1 Supported in part by a research grant from Synarc, Inc. Parts of this paper were presented as an InfoRad exhibit at the Radiological Society of North America Annual Meeting 2000, Abstract 9320IMA-i, Title "A Mammographic Density Reading Service for Clinical Drug Trials." ![]()
2 To whom requests for reprints should be addressed, at University of California-San Francisco, Department of Radiology, 533 Parnassus Avenue, Suite U368J, San Francisco, CA 94143-1250. Phone: (415) 502-6732; Fax: (415) 502-2663; E-mail: john.shepherd{at}oarg.ucsf.edu ![]()
3 The abbreviations used are: PD, percentage (breast) density; RMSE, root mean square error; RAD (group), radiologists (with limited background in mammography); MD (group), physicians (with no background in radiology); NMD (group), nonphysicians. ![]()
Received 11/12/01; revised 6/ 3/02; accepted 6/19/02.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
K. Kerlikowske, L. Ichikawa, D. L. Miglioretti, D. S. M. Buist, P. M. Vacek, R. Smith-Bindman, B. Yankaskas, P. A. Carney, and R. Ballard-Barbash Longitudinal Measurement of Clinical Mammographic Breast Density to Improve Estimation of Breast Cancer Risk J Natl Cancer Inst, March 7, 2007; 99(5): 386 - 395. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Maskarinec, I. Pagano, G. Lurie, and L. N. Kolonel A longitudinal investigation of mammographic density: the multiethnic cohort. Cancer Epidemiol. Biomarkers Prev., April 1, 2006; 15(4): 732 - 739. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Kerlikowske, J. Shepherd, J. Creasman, J. A. Tice, E. Ziv, and S. R. Cummings Are Breast Density and Bone Mineral Density Independent Risk Factors for Breast Cancer? J Natl Cancer Inst, March 2, 2005; 97(5): 368 - 374. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Ziv, J. Shepherd, R. Smith-Bindman, and K. Kerlikowske RESPONSE: Re: Mammographic Breast Density and Family History of Breast Cancer J Natl Cancer Inst, November 19, 2003; 95(22): 1726 - 1727. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Cancer Research | Clinical Cancer Research |
| Cancer Epidemiology Biomarkers & Prevention | Molecular Cancer Therapeutics |
| Molecular Cancer Research | Cancer Prevention Research |
| Cancer Prevention Journals Portal | Cancer Reviews Online |
| Annual Meeting Education Book | Cell Growth & Differentiation |