Incidence rates of non-Hodgkin’s lymphomas (NHLs) have nearly doubled in recent decades. Understanding the reasons behind these trends will require detailed surveillance and epidemiological study of NHL subtypes in large populations, using cancer registry or other multicenter data. However, little is known regarding the reliability of NHL diagnosis and subtype classification in such data, despite implications for the accuracy of incidence statistics and studies. Expert pathological re-review was completed for 1526 NHL patients who were reported to the Greater Bay Area Cancer Registry and who participated in a large population-based case-control study. Agreement of registry diagnosis with expert diagnosis and with International Classification of Diseases for Oncology-2 (Working Formulation) subtype classifications was measured with positive predictive values and κ statistics. Agreement of registry and expert diagnoses was high (98%). Thirty patients were found on review not to have NHL; most of these had leukemia. For subtypes, agreement of registry and expert classification was more moderate (59%). Agreement varied substantially by subtype from 5% to 100% and was 77% for the most common subtype, diffuse large cell lymphoma. Seventy-seven percent of 128 registry-unclassified lymphomas were assigned a subtype on re-review. Our analyses suggest excellent diagnostic reliability but poorer subtype reliability of NHL in cancer registry data information that is critical to the interpretation of lymphoma time trends. Thus, overall NHL incidence and survival statistics from the early 1990s are probably accurate, but subtype-specific statistics could be substantially biased, especially because of high (15–20%) proportions of unclassified lymphomas.
Non-Hodgkin’s lymphomas (NHLs) are lymphoid tissue malignancies representing the sixth most common type of cancer diagnosed in United States men and women (1) . Between 1973 and 2000, incidence rates for NHL in the United States increased nearly 80%, one of the largest increases observed among all cancers (2) . The reasons for these increases remain poorly understood because few risk factors other than severe immunodeficiency have been identified for NHL, further underscoring the importance of ongoing surveillance and epidemiological study of this group of diseases.
A major concern for the epidemiological study of NHL involves tumor subtype. At present, at least 20 subtypes of NHL have been defined (3) , each distinct with respect to histopathological, immunological, and clinical characteristics, leading to the assumption that they might also differ according to incidence trends and etiology. However, the surveillance and study of NHL subtypes are methodologically challenging. First, even large studies may have low statistical power for the many relatively uncommon subtypes. Second, the coexistence of multiple NHL classification systems (e.g., Kiel, Lukes-Collins, Rappaport, Working Formulation, Revised European-American Lymphoma (REAL), and WHO) over the past few decades complicates the comparison of study results. Third, diagnostic and classification criteria may not be uniform in cancer registry or other population-based settings because NHL may have been diagnosed and classified by numerous independent pathologists, perhaps using diverse methodologies. This problem should be mitigated somewhat with the implementation of the WHO consensus classification, the most recently developed classification scheme that demonstrates superior reliability to prior schemes in clinical settings (4) . Regardless, for cancer registry data collected before 2001 that classified NHL according to the Working Formulation, variability in subtype classifications could have resulted in inaccurate incidence and survival statistics, especially time trends such as the marked increases reported recently for follicular, peripheral T-cell, and small cell lymphocytic lymphomas (5 , 6) .
Interobserver reliability of NHL diagnostic criteria and Working Formulation classifications has been addressed in clinical series (4 , 7) but has not been evaluated recently in population-based data. To our knowledge, only one prior analysis compared NHL diagnostic and subtype information reported from a cancer registry in the early 1980s with that obtained from a panel of expert pathologists and found 93% agreement with the diagnosis of NHL, but only 55% agreement with subtype (8) . Therefore, to update our understanding of interobserver reliability for NHL, we compared cancer registry NHL subtype diagnoses, which are transcribed directly from the medical record, with diagnoses obtained from a uniform re-review by an expert hematopathologist as part of a large, population-based study of NHL patients diagnosed in the San Francisco Bay Area in the early 1990s. We also estimated the impact of this reliability on NHL subtype incidence statistics.
Materials and Methods
Study subjects were 1610 NHL patients who participated in a large, population-based case-control study described in detail elsewhere (9, 10, 11, 12, 13, 14, 15, 16, 17) . The 2657 patients initially eligible for this study were aged 21–74 years, lived in one of six San Francisco Bay Area counties when newly diagnosed with NHL in the period 1988 through 1994, and were identified with rapid case ascertainment procedures by the population-based Greater Bay Area Cancer Registry, a participant in the National Cancer Institute’s Surveillance, Epidemiology, and End Results (SEER) program. Pathological re-review was conducted by an expert hematopathologist (R. F. D.) using original diagnostic sections stained with H&E for patients who provided written consent for the re-review of diagnostic slides. No re-review occurred for eligible patients who were deceased or who declined to participate in the interview (n = 1047). Detailed comparisons of participating and nonparticipating patients are published elsewhere (15) , but briefly, nonparticipating patients were more likely than participating patients to have more rapidly fatal lymphoma types, especially HIV-related (42% versus 18%) and high-grade lymphomas (22% versus 11%), and more than half of these patients died before they could be interviewed.
Of 1610 consenting patients, pathology review could not be undertaken for 53 (3%) for technical reasons (e.g., pathology reports or diagnostic slides could not be located, or sections were determined to be technically inadequate). In addition, original registry histology could not be retrieved for 35 patients (2%) who were determined by the cancer registry not to be incident NHL cases in the region after study completion. Therefore, 1522 patients with valid registry and re-review information were included in the present analysis.
In the cancer registry, lymphoma classification is operationalized through the International Classification of Diseases for Oncology (ICD-O) morphological classification system, which is routinely updated to reflect changes in diagnostic practice. For lymphomas, the first (ICD-O-1) and second (ICD-O-2) editions (18) generally classified lymphomas according to the Working Formulation and other now-obsolete schemes. ICD-O-1 was used to code cases diagnosed before 1990, and a field trial edition of ICD-O-2 was used for cases diagnosed in 1990 and 1991. The final ICD-O-2 was used to classify diagnoses in the years 1992 through 2000. For the present analysis, all cancer registry and expert diagnoses were classified according to ICD-O-2 (Table 1)⇓ , although the hematopathologist did not use the ICD-O-2 classifications 9693 (lymphocytic lymphoma, well-differentiated, nodular), 9696 (lymphocytic lymphoma, poorly differentiated nodular); 9701 (Sezary’s disease), 9702 [peripheral T-cell lymphoma, not otherwise specified (NOS)], 9704 (lymphoepitheloid lymphoma), or 9709 (cutaneous lymphoma). Therefore, we grouped original registry diagnoses using these classifications (n = 33; 2%) into an “obsolete codes” category for comparison with expert diagnoses. In addition, several generally uncommon classifications not originally represented in the case series could not be re-reviewed; these included 9674 (centrocytic lymphoma), 9676 (centroblastic-centrocytic lymphoma, diffuse), 9692 (centroblastic-centrocytic lymphoma, follicular), 9697 (centroblastic lymphoma, follicular), 9703 (T-zone lymphoma), 9705 (peripheral T-cell lymphoma angioimmunoblastic lymphadenopathy with dysproteinemia), 9706 (peripheral T-cell lymphoma, pleomorphic small cell), 9707 (peripheral T-cell lymphoma, pleomorphic medium and large cell), 9712 (angioendotheliomatosis), 9713 (angiocentric T-cell lymphoma), and 9714 (large cell Ki-1+ lymphoma). Altogether, these classifications represented less than half of 1% of all eligible NHLs reported to the cancer registry during this time period. For some analyses, we grouped subtypes together into broader categories modified from those used by Groves et al. (5) , including NHL, NOS (9590–9595), follicular (9690, 9691, 9695, and 9698), diffuse (9672, 9673, 9675, 9680–9682, and 9711), and high grade (9685–9687).
Agreement between registry and expert diagnoses was quantified using the positive predictive value (19) as a measure of the percentage of all registry diagnoses confirmed by the expert hematopathologist. In addition, for all cases confirmed by expert review as NHL, we compared registry and expert diagnosis by calculating the κ statistic, which represents the extent of agreement beyond that expected by chance (20) , and the 95% confidence interval around this statistic. In general, κ statistics ranging from 0.41 to 0.60 have been interpreted to indicate “moderate” agreement, those ranging from 0.61 to 0.80 have been interpreted to indicate “substantial” agreement, and those of ≥0.81 have been interpreted to indicate “near perfect” agreement (21) . Calculation of all cross-tabulations and study statistics was performed using SAS version 6.12.
To understand the impact of diagnostic and classification reliability on NHL incidence statistics, we estimated the hypothetical distribution of histological subtypes that would have been expected if expert review had occurred for all cases in the SEER database. We did not have sufficient numbers to assess positive predictive values for age-specific groups needed to recalculate age-adjusted incidence rates. To produce the hypothetical distribution, we obtained case counts for all NHLs by ICD-O-2 histological type for the entire nine-registry SEER database (22) for the same period from which the re-reviewed cases were diagnosed (1988–1994) and multiplied those counts by the positive predictive values described above.
Of 1522 re-reviewed cases, 26 (2%) were confirmed as a diagnosis other than NHL. Of these, 9 were chronic lymphocytic leukemias (CLL), 2 were acute leukemias, 2 were Hodgkin’s disease (HD), 1 was anaplastic plasmacytoma, 1 was composite HD and NHL, and 11 were specified as “not NHL.”
Subtype Classification Reliability.
Overall, for the 1506 cases confirmed as NHL, original and expert subtype classification agreed for only 883 (59%). Moderate overall agreement was indicated by the κ statistic of 0.54 (95% confidence interval, 0.50–0.57). However, across subtypes, agreement ranged from 4% to 100% (Table 2)⇓ . Agreement was 77% for diffuse large cell lymphoma, the most common NHL subtype in this series. Agreement was perfect for monocytoid B-cell lymphomas (100%) and was also high for mycosis fungoides (88%) but was lower for other subtypes including small lymphocytic lymphoma (69%) and immunoblastic lymphoma (44%). The subtypes with the poorest agreement included diffuse small cleaved cell lymphoma (5%) and follicular lymphoma, NOS (4%).
In general, disagreement occurred chiefly among related subtypes. Among high-grade lymphomas, agreement was moderate for diffuse, small noncleaved cell (non-Burkitt’s) lymphoma (47%), immunoblastic lymphoma (44%), lymphoblastic lymphoma (71%), and Burkitt’s lymphoma (46%), which was found on re-review to be the non-Burkitt’s type in 38% of cases. However, when these classifications were grouped together into a single high-grade lymphoma grouping, agreement improved to 74%. Agreement was good for some of the follicular lymphoma subclassifications [namely, small cleaved cell (79%) and large cell (84%)] but less so for mixed small cleaved cell/large cell (64%). However, disagreements generally occurred among these subclassifications, whereas overall agreement for the broader follicular grouping was 83%. Agreement was poor to fair for diffuse subtypes such as diffuse, large cell cleaved (31%) and diffuse, large cell noncleaved (44%) but was substantially better (73%) when diffuse subtypes were grouped together into a single category.
Among the 128 NHL cases originally unclassified, only 23% remained unclassified after re-review. NHL cases that originally were unclassified were disproportionately re-reviewed as large B-cell lymphoma. In addition, 68 cases classified in the cancer registry as specific subtypes were found by the expert to be unclassifiable. These included a disproportionate number of diffuse and high-grade subtypes including immunoblastic, lymphoblastic, Burkitt’s, and diffuse small cell noncleaved lymphoma.
Application of Results to SEER Database.
When these results were extrapolated to the larger SEER database to determine the impact of the expert review on the distribution of subtypes (Table 3)⇓ , the proportion of unclassified cases was cut in half, from 14% to 7%. Some subtypes were increased in representation (small lymphocytic, diffuse lymphocytic, and diffuse large cell, and follicular, small cleaved cell), whereas others decreased slightly (diffuse, small cleaved cell, and immunoblastic lymphomas). Diffuse large cell lymphoma remained the most common subtype but represented a greater proportion of all patients (32% versus 25%).
This analysis found a high degree of interobserver agreement in NHL diagnosis in Greater Bay Area Cancer Registry data because only 2% of registry-designated NHL was found on re-review to be some other condition. This proportion is substantially lower than that reported by Dick et al. for NHL cases in the Iowa and Minnesota cancer registries (8%, including 4% CLL) (8) . Whereas a low proportion of registry-reported NHL in this study was determined to be other cancers after expert review, it is not known how many other cancers would have been found on re-review to be NHL. In our previous study of the diagnostic reliability of HD, 8 of 362 cases (2%) originally reported to the cancer registry as HD were later found to be NHL by the same hematopathologist who conducted this re-review (23) . A veterans’ hospital tumor registry study found 2 of 62 cases (3%) originally reported as CLL to be small lymphocytic lymphoma on re-review (24) , although population-based patterns of CLL misclassification are unknown. Nonetheless, the number of other cancers misclassified as NHL and true NHL cases misclassified as other cancers appears to be low.
Whereas overall diagnostic reliability was shown to be excellent, our data suggest poorer reliability of NHL subtypes. Our findings of 59% agreement between registry and expert subtype classification, as well as substantial variation in agreement by specific subtype (5–100%), concur with those reported by Dick et al. (8) , who found 55% overall agreement and subtype variation of 14–100% in cases diagnosed at least 10 years previously. Subtype-specific agreements observed in this study also were generally similar to those reported by Dick et al., with agreement exceeding 75% for small lymphocytic lymphoma and most follicular subclassifications (with the exception of follicular, NOS), and poor agreement for diffuse small non-cleaved cell (Burkitt’s) lymphoma (8) . However, we observed substantially better agreement for diffuse large cell lymphoma (77%) than that reported in the Midwestern data (47%; Ref. 8 ). We also observed that agreement for broader groupings of subtypes (e.g., follicular, diffuse) was generally better than that for component subtypes.
These results suggest that incidence and survival rates for NHL produced with data from the SEER program or other cancer registries collected prior to 2001 probably are not substantially impacted by diagnostic misclassification with leukemias, HD, or other cancers but that rates for certain NHL subtypes could be moderately biased. Rates for broader groupings of subtypes may be less biased than single subtypes. Although we did not have the statistical power to estimate age-specific agreement that would be needed to recalculate incidence rates properly, our estimates of the revised distribution of lymphoma subtypes suggest that, without the benefit of uniform review, rates of diffuse large cell lymphoma may be underestimated, whereas rates of unclassified lymphomas may be overestimated. These findings also illustrate the importance of including uniform re-review in population-based epidemiological studies of NHL to assign subtype, although not necessarily to exclude invalid diagnoses of NHL.
Our results have identified several areas for improvement in the registration of lymphomas. Among NHLs reported to the cancer registry as unclassifiable, only 25% could not be assigned a subtype by our expert hematopathologist. Fewer than 5% of follicular lymphoma NOS cases could not be subclassified by the expert, suggesting that nearly all follicular lymphomas in the cancer registry can be assigned a subclassification. Ambiguous terminology on pathology reports can result in cancer registrars interpreting and classifying NHL subtypes nonuniformly (25) , but otherwise, the reasons why NHLs are registered as unclassified and whether these problems are remediable remain unclear. Because unclassified lymphomas represented nearly one-fifth of all NHLs reported to the SEER program throughout the 1990s, the misclassification of classifiable lymphomas as NOS has biased subtype distributions, as shown here, and is likely to have biased subtype-specific incidence rates, trends, and survival statistics published previously (2 , 5 , 6 , 26 , 27) .
The generalizability of our results could be limited by several factors. First, because slides for pathological re-review were obtained for patients who participated in a case-control study as opposed to a complete population-based series, our estimate of overall agreement for NHL subtypes may be biased by the nonparticipation of patients with concurrent HIV infection or otherwise poor survival (15) , who were more likely to have had high-grade or immunoblastic subtypes that showed generally poorer agreement than other subtypes. However, it is uncertain how nonparticipation might have biased subtype-specific estimates of diagnostic agreement and positive predictive value. Second, because this study was conducted in the San Francisco Bay Area during the crest of the HIV epidemic, its distribution of NHL subtypes probably differs from that of other regions. Whereas the distribution of subtypes itself should not have materially impacted our findings, the excess of HIV-associated NHL subtypes in our region may have influenced physicians’ diagnostic or classification practices. Third, the location in this region of Stanford University Medical Center, a renowned lymphoma referral center, could have resulted in higher agreement, although diagnostic and classification agreement did not differ between HD patients initially diagnosed at Stanford University Medical Center and those diagnosed elsewhere (23) . Fourth, different rates of agreement may have been obtained from a panel review, an approach used in other reliability studies (4) .
Because this comparison used ICD-O-2 classifications based on the Working Formulation, it cannot address the reliability in population-based data using the new ICD-O-3 classifications now being used by cancer registries to classify lymphomas diagnosed in the year 2001 and beyond. The ICD-O-3 classification is based heavily on the WHO system, which relies substantially on immunohistochemical characteristics, and has been shown to have interobserver reliability exceeding 85%, a major improvement over the Working Formulation and other systems (28) that depend more on microscopic characteristics. The better reliability of the WHO classification in the clinical setting should translate to improved reliability in cancer registry data. Nonetheless, reliability of ICD-O-2 subtype classifications will remain relevant in analyses of time trends and in population-based series including patients diagnosed before 2001.
The lack of uniformity in NHL diagnostic and subtype classification in cancer registry and other multicenter data has been unsettling for decades, particularly before the adoption of a consensus classification system. The implementation of the reliable consensus WHO system marks a hopeful moment in our quest to learn more about the etiologies of specific NHL subtypes, especially subtypes shown to be increasing in incidence. Therefore, interobserver reliability will continue to be important to the interpretation of secular trends in the incidence of NHL and specific subtypes and should be reassessed periodically.
We thank Jennifer Kristianson and Trisha Harasty for their contributions.
Grant support: Rapid Response Surveillance Study program of the National Cancer Institute Surveillance, Epidemiology, and End Results program (contract N01-CN-65107). National Cancer Institute Grants CA45614, CA66529, and CA89745 to Dr. Holly supported the primary work on the case-control study. Cancer incidence data were collected by the Northern California Cancer Center under Contract N01-CN-65107 with the National Cancer Institute, NIH and with the support of the California Cancer Registry, a project of the Cancer Surveillance Section, California Department of Health Services, under subcontract 1006128 with the Public Health Institute.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Requests for reprints: Christina A. Clarke, Northern California Cancer Center, 32960 Alvarado-Niles Road, #600, Union City, California 94587. Phone: (510) 429-2500; Fax: (510) 991-4405; E-mail:
- Received July 28, 2003.
- Revision received September 2, 2003.
- Accepted September 10, 2003.