Background: Surveillance for hepatocellular carcinoma (HCC) is recommended in patients with cirrhosis, but the effectiveness of a surveillance program in clinical practice has yet to be established.
Aims: To evaluate the effectiveness of a surveillance program with ultrasound and alpha-fetoprotein (AFP) to detect early HCCs.
Methods: Four hundred and forty-six patients with Child A/B cirrhosis were prospectively enrolled between January 2004 and September 2006 and followed until July 2010. HCC surveillance using ultrasound and AFP was conducted per the treating hepatologist, although the standard was every 6 to 12 months. HCC was diagnosed using American Association for the Study of Liver Disease (AASLD) guidelines and early HCC defined by Barcelona Clinic Liver Cancer (BCLC) staging. Performance characteristics were determined for surveillance using AFP, ultrasound, or the combination.
Results: After a median follow-up of 3.5 years, 41 patients developed HCCs, of whom 30 (73.2%) had early HCCs. The annual incidence of HCC was 2.8%, with cumulative 3- and 5-year incidence rates of 5.7% and 9.1%, respectively. Surveillance ultrasound and AFP had sensitivities of 44% and 66% and specificities of 92% and 91%, respectively, for the detection of HCCs. Sensitivity significantly improved to 90%, with minimal loss in specificity (83%) when these tests were used in combination.
Conclusions: When used as a surveillance program in a real-world clinical setting, combination of ultrasound and AFP is the most effective strategy to detect HCC at an early stage.
Impact: Our results differ from the guidelines of the AASLD. Cancer Epidemiol Biomarkers Prev; 21(5); 793–9. ©2012 AACR.
Hepatocellular carcinoma (HCC) is the third leading cause of cancer-related death worldwide and has an increasing incidence in the United States (1). Its incidence is expected to continue increasing over the next 20 years due to the current epidemic of advanced fatty liver disease and hepatitis C virus (HCV) cases (1). The prognosis for patients with HCC largely depends on tumor stage at the time of diagnosis. Patients with early HCCs, defined as one nodule less than 5 cm or 3 nodules each less than 3 cm in diameter, can achieve 5-year survival rates near 70% with surgical resection or liver transplantation (2, 3). These survival rates are in contrast to an average survival of less than 1 year for patients with advanced HCCs (4).
Surveillance using ultrasound with or without alpha-fetoprotein (AFP) at 6- to 12-month intervals strives to detect HCCs at an early stage when it is amenable to curative therapy (5) and is recommended in high-risk populations (6). A recent meta-analysis of prospective cohort studies found that HCC surveillance using a combination of ultrasound and AFP was highly efficacious, with a pooled sensitivity of 69% to find HCCs at an early stage (7). However, its effectiveness in clinical practice may be impacted by several factors, including low utilization rates among at-risk patients (8, 9).
When implemented in clinical practice, HCC surveillance is a complex process requiring multiple components: (i) providers identify appropriate at-risk patients, (ii) providers refer patients for surveillance, (iii) patients understand and accept the tests, (iv) the health care system schedules the tests, and (v) patients comply with surveillance recommendations (10). The benefits of surveillance tests can often be reduced because of patient-level (e.g., socioeconomic status and insurance), physician-level (e.g., knowledge of guidelines), and system-level factors (e.g., availability of surveillance tests; ref. 11). Given this potential discrepancy between an intervention's efficacy (the effect under carefully controlled conditions) and effectiveness (the effect when implemented in real-world settings), there has been increasing emphasis on comparative effectiveness research to improve delivery of care (9, 12). Accordingly, the NIH recently included the evaluation of real-world outcomes of health care interventions in liver disease as a priority area for future research.
Although the most recent American Association for the Study of Liver Disease (AASLD) guidelines recommend using ultrasound alone for HCC surveillance, the optimal surveillance method (ultrasound, AFP, or combination) in clinical practice has not been determined (7, 13, 14). A significant amount of data exists supporting the use of AFP in HCC surveillance (13). Furthermore, we hypothesized that the gap between efficacy and effectiveness might be smaller for AFP than ultrasound because of the ease of obtaining a blood test. Therefore, the aim of our study was to determine the effectiveness of a surveillance strategy with ultrasound and AFP to detect HCCs at an early stage in a real-world clinical setting.
Between January 2004 and September 2006, consecutive patients with cirrhosis were prospectively identified and entered into a surveillance program using ultrasound and AFP. The diagnosis of cirrhosis was based on histology or imaging showing a cirrhotic-appearing liver with associated signs of portal hypertension including splenomegaly, varices, or thrombocytopenia. Patients were enrolled from the University of Michigan (Ann Arbor, MI) General Hepatology or Liver Transplant outpatient clinics if they had Child-Pugh class A or B cirrhosis and absence of known HCC at the time of initial evaluation. Absence of HCC was determined by imaging lacking any suspicious appearing masses within 6 months of enrollment. Patients with an AFP level greater than 20 ng/mL at enrollment were only included if computed tomography (CT) or MRI confirmed the absence of any suspicious masses within 3 months of enrollment. Other exclusion criteria included clinical evidence of significant hepatic decompensation (refractory ascites, grade III–IV encephalopathy, active variceal bleeding, or hepatorenal syndrome), co-morbid medical conditions with a life expectancy of less than 1 year, prior solid organ transplant, and a known extrahepatic primary tumor. This study protocol was approved by the Institutional Review Board at the University of Michigan, and informed consent was obtained in writing from each patient.
The following demographic and clinical data were collected at enrollment: age, gender, race, weight, height, lifetime alcohol use, and lifetime tobacco use. Data about their liver disease included the underlying etiology, degree of ascites, presence of encephalopathy, and presence of esophageal or gastric varices. Laboratory data of interest at the time of enrollment included complete blood count (CBC), creatinine, aspartate aminotransferase (AST), alanine aminotransferase (ALT), alkaline phosphatase, bilirubin, albumin, international normalized ratio (INR), and AFP.
Patients were classified according to the etiology of liver disease, including HCV (presence of HCV antibody or RNA in serum), hepatitis B (presence of hepatitis B surface antigen in serum), alcohol-related liver disease (history of alcohol intake >40 g/d for at least 10 years), others (including hereditary hemochromatosis, primary sclerosing cholangitis, primary biliary cirrhosis, and autoimmune hepatitis), and cryptogenic cirrhosis (negative work-up for all of the above etiologies).
Follow-up and detection of HCC
Patients underwent evaluation every 6 months by physical examination, routine biochemical tests (including CBC, creatinine, albumin, AST, ALT, alkaline phosphatase, bilirubin, and INR), ultrasound, and AFP. Although all enrolled patients were prospectively followed, patients were managed as deemed appropriate by their hepatologist and not a strict study protocol. Importantly, patients were not reminded by study personnel to have screening done. Thus, while the accepted standard was ultrasound and AFP every 6 to 12 months, this did not happen in every patient for various reasons as described in the introduction. Patients were categorized as receiving consistent surveillance (ultrasound with or without AFP done at least annually), inconsistent surveillance (ultrasound or AFP done at a frequency of greater than 1 year but less than 2 years), or no surveillance (no surveillance test for more than 2 years). If an AFP level was elevated or mass lesion was seen on ultrasound, the usual practice was to conduct triple-phase CT or MRI to evaluate the presence of HCCs as recommended by AASLD guidelines. For study purposes, patients were followed until the time of HCC diagnosis, liver transplantation, death, or until the study was terminated on July 31, 2010. HCC cases diagnosed within the first 6 months of enrollment (prevalent cases) were excluded. Patients lost to follow-up were censored at the time of their last clinic visit. The Social Security Death File and the State of Michigan Death Records were used to ascertain date of deaths for patients.
HCC was diagnosed using AASLD guidelines, and the Barcelona Clinic Liver Cancer (BCLC) system was used for tumor staging (6). For tumors greater than 2 cm in size, the diagnosis was made by the presence of a typical vascular pattern on dynamic imaging (arterial enhancement and washout on delayed images) or an AFP level greater than 200 ng/mL. For tumors with a maximum diameter of 1 to 2 cm, the diagnosis was made by the presence of a typical vascular pattern on 2 dynamic imaging studies or histology. All cases of HCCs were adjudicated by 2 authors (A.G. Singal and J.A. Marrero) to confirm that they met diagnostic criteria and to determine tumor stage at the time of diagnosis.
The cumulative probability of HCC development was determined by competing risk analysis, with transplantation and death being considered as competing outcomes. Patients who were lost to follow-up were right censored. We assessed the performance characteristics of AFP and of ultrasound for the detection of HCCs. For each test, sensitivity and specificity for each test were calculated on a per-patient basis. Patients with an AFP level greater than 20 ng/mL or mass lesion on ultrasound without subsequent HCC confirmed on triple-phase CT or MRI were recorded as a “false positive” test. Patients who were alive at the end of follow-up without developing HCC or undergoing liver transplantation were followed for at least an additional 6 months to confirm the absence of HCC. Univariate regression analysis using Mann–Whitney rank-sum and χ2 tests was conducted to identify factors associated with ultrasound's and AFP's sensitivity and specificity for the detection of HCCs. Data analysis was conducted using Stata 10.
Between January 2004 and September 2006, 446 patients with cirrhosis were identified and prospectively followed. Four patients were discovered to have prevalent tumors within 6 months of enrollment and were excluded. Baseline characteristics of the remaining 442 patients are shown in Table 1. The median age of patients was 52.8 years (range, 23.6–82.4 years). More than 90% of the patients were Caucasian and 58.6% were men. The most common etiologies of cirrhosis were HCV (47.3%), cryptogenic (19.2%), and alcohol-induced liver disease (14.5%). A total of 42.9% patients were Child Pugh class A and 52.5% were Child Pugh class B. Median Child Pugh and MELD (Model for End-Stage Liver Disease) scores at enrollment were 7 and 9, respectively. Median baseline AFP level was 5.9 ng/mL in patients who developed HCC, which was significantly higher than the median baseline AFP of 3.7 ng/mL in patients who did not develop HCC during follow-up (P < 0.01).
The median follow-up of the cohort was 3.5 years (range, 0.6–6.6 years). Follow-up was conducted for at least 1 year in 392 (88.7%) patients, whereas 50 patients were followed for less than 1 year. Of the 442 patients in the final cohort, 69 (15.6%) were lost to follow-up before the study being terminated on July 31, 2010. During the 1,454 patient-years of follow-up, 1,555 AFP levels and 1,238 ultrasounds were conducted. Consistent surveillance was conducted in 271 (61.3%) patients, whereas 107 (24.2%) patients received inconsistent surveillance and 64 (14.5%) patients received no surveillance. The consistency of surveillance was similar among those lost to follow-up (P = 0.70). Of the patients lost to follow-up, 45 (65.2%) had consistent surveillance, 14 (20.3%) received inconsistent surveillance, and 10 (14.5%) patients received no surveillance. The percentage of tumors diagnosed at an early stage was not significantly different between patients who received consistent surveillance and those who received inconsistent surveillance (75% vs. 60%, P = 0.48), although we may have been underpowered to detect a difference.
Incidence of HCC
Over the 1,454 person-year follow-up period, 41 patients developed HCC for an annual incidence of 2.8%. The cumulative 3- and 5-year probabilities of HCC development were 5.7% and 9.1%, respectively, based on the competing risk model (Fig. 1). The time from study enrollment to development of HCC ranged from 0.5 to 5.9 years. The diagnosis of HCC was made by imaging showing an arterially enhancing lesion with delayed washout in 33 patients, histologic confirmation in 6 patients, and as an incidental finding at the time of transplantation in 2 cases. Of the 41 patients who developed HCCs, 4 tumors were classified as very early (Barcelona stage 0) and 19 were classified as early-stage (Barcelona stage A). Seven patients had intermediate-stage (BCLC B) tumors and 3 had advanced-stage (BCLC C) tumors. Eight patients had BCLC stage D tumors related to the presence of Child C cirrhosis at the time of diagnosis (Table 2).
Effectiveness of ultrasound and AFP for HCC surveillance
The method of HCC detection during surveillance is recorded in Table 2. The per-patient sensitivity and specificity of ultrasound for the detection of HCC were 43.9% (18 of 41) and 91.5% (367 of 401), respectively (Table 3). When excluding the 10 patients without an ultrasound within 6 months of HCC diagnosis, the sensitivity of ultrasound was 58.1% (18 of 31). The positive and negative likelihood ratios of ultrasound were 5.2 and 0.61, respectively. False-positive ultrasounds led to 48 cross-sectional diagnostic imaging studies among 34 patients: 7 CT scans and 41 MRIs. The per-patient sensitivity and specificity of AFP were 65.9% (27 of 41) and 90.5% (363 of 401), respectively, for the detection of HCCs. The positive and negative likelihood ratios of AFP were 7.0 and 0.38, respectively. False-positive AFP tests led to 42 cross-sectional diagnostic imaging studies among 36 patients: 3 CT scans and 39 MRIs. Using ultrasound and AFP in combination increased the sensitivity of surveillance to 90.2% (37 of 41) with a specificity of 83.3% (334 of 401) for detecting HCCs. The sensitivity of the tests in combination was significantly higher than that of ultrasound alone (P < 0.001) and AFP alone (P = 0.02). The positive and negative likelihood ratios for ultrasound and AFP in combination were 5.4 and 0.12, respectively.
The sensitivity of ultrasound for detecting HCC was significantly associated with race (P = 0.04) and baseline MELD score (P = 0.03). Whereas 18 (50%) of the 36 Caucasian patients with HCCs had their tumors detected on surveillance ultrasound, all 5 non-Caucasian patients had their tumors missed by surveillance ultrasound. Patients with HCCs detected on ultrasound also had higher median MELD scores than those whose tumor was missed by surveillance ultrasound (11.5 vs. 9.0). The sensitivity of ultrasound was 60% in patients with an MELD score greater than 10, compared with only 18.8% in those with lower MELD scores. We did not identify any factors associated with the sensitivity of AFP for detecting HCCs, although this could have been because of limited statistical power.
The specificity for both ultrasound and AFP were both significantly associated with underlying hepatitis C liver disease. Whereas HCV etiology was associated with a higher specificity for surveillance ultrasound (94.6% vs. 89.0%, P = 0.04), it negatively impacted the specificity of AFP (83.7% vs. 97.2%, P < 0.001). The specificity for AFP was also associated with Caucasian race, with a specificity of 92.6% in Caucasians, compared with only 70.8% in non-Caucasians (P < 0.001).
Our study is the first to evaluate the effectiveness of a surveillance program using ultrasound and AFP every 6 to 12 months among patients with cirrhosis in a real-world clinical setting. Ultrasound and AFP both had sensitivities near or below 65% for detecting HCC in a real-world setting, although this was increased to 90% when used in combination. The sensitivity of the tests in combination was significantly higher than that of ultrasound alone (P < 0.001) and AFP alone (P = 0.02), with a minimal loss in specificity. Had all patients undergone surveillance using ultrasound alone, this would have led to 13 (32%) diagnoses of HCC at early stage with 48 unnecessary CT or MRI scans due to false-positive results, whereas combination surveillance detected 26 (63%) HCCs at an early stage, with 90 unnecessary CT or MRI scans.
A recent meta-analysis of prospective cohort studies found that the pooled sensitivity of ultrasound to find early-stage HCCs was 63% when used alone and 69% when used in combination with AFP (7). In our cohort, the sensitivity of ultrasound for early-stage tumors was only 32%, which was significantly increased to 63% (P = 0.008) when used in combination with AFP. Only 8 patients had a positive ultrasound and elevated AFP before HCC diagnosis, with the majority only having one positive surveillance study. These results highlight the large discrepancy between the effectiveness of surveillance ultrasound and its reported efficacy in previously published prospective studies. Thus, although AFP may be of minimal benefit in prospective clinical trials, it appears to provide a greater benefit among patients in real-world clinical settings. A recent cost-effective analysis found that combination of ultrasound and AFP was the preferred strategy when the sensitivity of ultrasound fell below 65% (15). In contrast to current guideline recommendations (14), these results suggest that AFP should continue to be used in combination with ultrasound during HCC surveillance.
Several studies have suggested that the effectiveness of HCC surveillance may be impacted by low utilization rates among at-risk patients (8, 9, 16). Consistent surveillance was conducted in 60% of patients in our study, which is substantially higher than the 19% pooled surveillance rate from a recent meta-analysis (17). Despite these high surveillance utilization rates, 10 patients with HCCs did not have an ultrasound within 6 months of diagnosis. Although underutilization was a factor in determining the effectiveness of ultrasound, the sensitivity of ultrasound for HCC was still only 58% (18 of 31) when excluding the 10 patients without an ultrasound within 6 months of HCC diagnosis. Surveillance using a combination of ultrasound and AFP still had a significantly higher sensitivity for HCCs (P = 0.002).
One reason for the apparent gap between efficacy and effectiveness of ultrasound may be related to operator quality. In clinical trials, ultrasounds are often conducted by physicians or experienced ultrasonographers using standardized imaging protocols, but in real practice, these examinations are usually conducted by radiology technicians with limited medical knowledge (18). In addition, patients often obtain their ultrasounds in local community centers instead of at a single centralized tertiary care center, introducing more variability in operator experience and technique. Alternatively, this difference in sensitivity could also be related to differences in patient characteristics, as is seen with breast density for mammography (19). For HCC surveillance, the ability of ultrasound to accurately visualize the liver in patients with morbid obesity or a very nodular liver may be impaired (20). Upon exploratory regression analysis, we found that the sensitivity of ultrasound was associated with Caucasian race and higher MELD scores. Although we did not find any association with obesity and Child Pugh score, this may have been because of limited statistical power. Further research is necessary to better understand the impact of operator dependency and patient characteristics on the sensitivity of ultrasound to help improve its performance in detecting early-stage HCCs.
It is important to note that our study had several limitations. Our study was conducted in a single tertiary care center and may not be generalized to other practice settings. In addition, the performance characteristics of surveillance ultrasound likely vary by operator experience and center. Another limitation of our study is the fact that approximately 18% of the patients were lost to follow-up, although the median follow-up for these patients was 2.8 years and their survival status was verified through the social security death file. Furthermore, these patients had less advanced cirrhosis (lower Child Pugh class and MELD scores) and were less likely to develop hepatic decompensation, HCC, or death. Overall, we believe that the limitations of this study are outweighed by its notable strengths including its prospective enrollment, its large sample size, and its diverse population with both viral and nonviral liver disease. Most importantly, our study is one of the first to describe the real-world effectiveness of surveillance in a cohort of American patients with cirrhosis.
In conclusion, there is a large gap between the efficacy and effectiveness of ultrasound and AFP for HCC surveillance among patients with cirrhosis. Ultrasound and AFP are both suboptimal surveillance tools when used alone and should be used in combination to help maximize sensitivity for early-stage HCCs. Overall, an HCC surveillance program in patients with cirrhosis can be effective, detecting more than 70% of all tumors at an early stage.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interests were disclosed.
Conception and design: R.J. Fontana, A.S. Lok, J.A. Marrero.
Development of methodology: J.A. Marrero.
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): H.S. Conjeevaram, F. Askari, G.L. Su, J.A. Marrero.
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): A.G. Singal, M.L. Volk, S. Fu, F. Askari, J.A. Marrero.
Writing, review, and/or revision of the manuscript: A.G. Singal. H.S. Conjeevaram, M.L. Volk, R.J. Fontana, F. Askari, G.L. Su, A.S. Lok, J.A. Marrero.
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): A.G. Singal, J.A. Marrero.
Study supervision: A.S. Lok, J.A. Marrero.
This project was supported, in part, by grants DK 064909, DK077707, and KL2 RR024983-05.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
- Received October 21, 2011.
- Revision received February 9, 2012.
- Accepted February 23, 2012.
- ©2012 American Association for Cancer Research.