Background: Hepatocellular carcinoma is a common complication of chronic liver disease (CLD), and is conventionally diagnosed by radiological means. We aimed to build a statistical model that could determine the risk of hepatocellular carcinoma in individual patients with CLD using objective measures, particularly serological tumor markers.
Methods: A total of 670 patients with either CLD alone or hepatocellular carcinoma were recruited from a single UK center into a case–control study. Sera were collected prospectively and specifically for this study. A logistic regression analysis was used to determine independent factors associated with hepatocellular carcinoma and a model built and assessed in terms of sensitivity, specificity, and proportion of correct diagnoses.
Results: The final model involving gender, age, AFP-L3, α fetoprotein (AFP), and des-carboxy-prothrombin (“GALAD”) was developed in a “discovery” data set and validated in independent data sets both from the same institution and from an external institution. When optimized for sensitivity and specificity, the model gave values of more than 0.88 irrespective of the disease stage.
Conclusions: The presence of hepatocellular carcinoma can be detected in patients with CLD on the basis of a model involving objective clinical and serological factors. It is now necessary to test the model's performance in a prospective manner and in a routine clinical practice setting, to determine if it may replace or, more likely, enhance current radiological approaches.
Impact: Our data provide evidence that an entirely objective serum biomarker–based model may facilitate the detection and diagnosis of hepatocellular carcinoma and form the basis for a prospective study comparing this approach with the standard radiological approaches. Cancer Epidemiol Biomarkers Prev; 23(1); 144–53. ©2013 AACR.
Hepatocellular carcinoma was, until recently, diagnosed on the basis of histologic examination of tumor tissue but the diagnosis can now be established with a high degree of specificity on the basis of characteristic radiological features once tumors are greater than 1 cm in diameter (1–3). Furthermore, such an approach obviates both the risk of bleeding and tumor seeding along the biopsy tract (4, 5).
Surveillance, well accepted to be the key to effective delivery of potentially curative treatment (1, 2), involves ultrasound examination (USS; refs. 1, 2, 6) followed, where a suspicious lesion is detected, by confirmatory tests, including conventional computed tomography (CT) or MRI scanning with or without biopsy. Estimation of serum α-fetoprotein (AFP) has also been used for diagnosis, with grossly elevated levels being highly specific for hepatocellular carcinoma (7) but, as the importance of early diagnosis has become apparent, the limited sensitivity of AFP for hepatocellular carcinoma in smaller tumors has reduced its value (1, 2, 8, 9). Other serological diagnostic tests include des-carboxy-prothrombin (DCP; an abnormal prothrombin molecule derived from an acquired defect in the posttranslational carboxylation of the prothrombin precursor; refs. 10, 11) and AFP-L3 (an isoform of AFP characterized by the presence of an α 1-6–linked residue on the AFP carbohydrate side chain; refs. 12, 13).
The limitations of radiological diagnosis of hepatocellular carcinoma are, however, being increasingly recognized in terms of diagnosis of new cases and in the screening setting (14). Both the number of non-hepatocellular carcinoma lesions and pseudolesions, which may collectively be more common than hepatocellular carcinoma lesions in the cirrhotic liver, and the necessity for considerable expertise in liver-imaging, have been noted (14). The limitations of ultrasound for screening are also becoming apparent. It has limited sensitivity, usually quoted at between 65% and 80%, but rather lower in early disease in which appearances are not specific and performance characteristics have not been well defined in nodular cirrhotic liver undergoing surveillance. Furthermore, it is subjective and dependent on operator experience and the available equipment (15–20). Again, it is increasingly recognized that although screening by USS may be effective in specialized centers, this does not necessarily translate into an effective screening system in the wider community (21). Increasing levels of obesity in the West also limit the sensitivity of USS (22).
For all these reasons, we consider here the possibility of establishing the diagnosis of hepatocellular carcinoma in the clinical and, potentially, in the screening setting, using entirely objective measurements, mainly the three serological tests AFP, DCP, and AFP-L3 by developing a statistical model. Being cognizant of the concern that any statistical model would need to perform well in the early-disease setting, we were careful to collect the clinical material such that patients could be rigorously classified as to their disease stage.
Materials and Methods
This case–control study involved 670 patients, 331 with hepatocellular carcinoma and a control group of 339 patients with chronic liver disease (CLD), alone. The patients with hepatocellular carcinoma were recruited at the Queen Elizabeth Hospital (Birmingham, UK) from among patients who were approached and consented to the study between 2007 and 2012 (Table 1). For all patients, the diagnosis was established by the histologic examination of tumor tissue (23%) or characteristic radiotherapy according to international guidelines (1, 2). Samples were taken at the time of first referral for treatment or further investigation. The median period between sample acquisition and formal diagnosis on the basis of CT scan or histology was 1.7 months. No treatment was administered between sample acquisition and formal diagnosis of hepatocellular carcinoma. CLD control samples (n = 339) were recruited from patients who were attending outpatient clinics for CLD in the same institution and classified as hepatitis B virus (HBV)-related, hepatitis C virus (HCV)-related, alcoholic-related, and “other” or “no” underlying CLD. The “other” group comprised patients with hemochromatosis, primary biliary cirrhosis, nonalcoholic steatohepatitis, or cryptogenic cirrhosis. The diagnosis of CLD was made on the basis of liver biopsy and/or typical clinical and imaging features. None of the CLD-control group had evidence of hepatocellular carcinoma at the time the relevant serum sample was taken or within a minimum follow-up period of 6 months (Table 1), but three of them developed hepatocellular carcinoma between 6 and 12 months. For the purpose of analysis, these three patients remained assigned to the CLD group. An age- and sex-matched control group of 92 subjects without any evidence of liver disease were recruited from patients with upper gastrointestinal symptoms who had no clinically significant abnormal findings at endoscopy. All patients gave informed consent for donating blood and the study procedure was approved by the South Birmingham Research Ethics Committee or The Newcastle and North Tyneside Ethics Committee. A standard operating procedure was applied to all blood collection. The Birmingham serum samples were collected prospectively for the discovery and internal validation sets, specifically for this research project and according to the REMARK guidelines (23, 24). The “discovery” set comprised 218 patients with hepatocellular carcinoma seen between April 2007 and January 2011. This sample size was based upon the calculation that approximately 200 subjects per group would be sufficient, using a two-sided test, to reject the null hypothesis of AUROC (area under the receiver operator curve) = 0.75 in favor of AUROC = 0.85 with 90% power for a significance level 0.05. On the basis of Harrell's rule of thumb, this sample size is also sufficient to allow the fitting of up to 20 candidate variables within a logistic discrimination model. The validation set comprised 113 patients with hepatocellular carcinoma seen between February 2011 and March 2012. The external validation set was from a previously reported study (25) designed to assess surveillance biomarkers in fatty liver disease (alcoholic and nonalcoholic). All markers were measured again on stored sera (Table 1) and collected specifically and prospectively for biomarker assessment.
Patients were classified as having “early” or “late” disease on the basis of three staging systems: tumor–node–metastasis (TNM) 6, Barcelona Liver Cancer Clinic (BCLC; refs. 26–30), Milan criteria (31), or on an “operational” basis. Stages I and II of TNM 6 and BCLC stages 0 and A were classified as early disease and, as an additional measure to disease stage, tumor size equal to or below 5 cm was considered early, whereas a size of more than 5 cm was considered late irrespective of tumor number. Those within and outside Milan criteria were categorized as early and late, respectively. Early and late disease was “operationally” classified on the basis of whether or not an experienced multidisciplinary team recommended potentially curative treatment. Where patients were listed for transplantation but had TransArterial ChemoEmbolization as initial treatment as a “bridge” to transplantation, they were classified as having early disease.
Routine liver and renal function tests (LFT and RFT) were measured on an automated analytical platform (the Roche Cobas 8000 Modular system) and the severity of the liver disease was defined according to the Child–Pugh score (32, 33). The hepatitis B surface antigen (HBsAg) and anti-HCV antibodies were measured using the e602 module (employing electrochemiluminescence technology) on the Roche Cobas 8000 system.
Assays of AFP, AFP-L3%, and DCP
AFP, AFP-L3%, and DCP were all measured in the same serum sample. The measurements of all three biomarkers were undertaken using a microchip capillary electrophoresis and liquid-phase binding assay on a μTASWako i30 auto analyzer (Wako Pure Chemical Industries Ltd.; ref. 34). Analytical sensitivity of μTASWako i30 is 0.3 ng/mL AFP and 0.1 ng/mL DCP, and the percentage of AFP-L3 can be measured when AFP-L3 is more than 0.3 ng/mL (34). All aspects of the test system performance have been reported (34). The assays were undertaken in a commercial laboratory with extensive experience of the μTASWako i30 auto analyzer; the operators had no knowledge of the diagnosis associated with the patient sample. There were no adverse events attributable to the biomarker tests.
Continuous measurements are presented as medians (ranges) and categorical measurements are presented as frequencies. Odds ratios (ORs) are calculated using logistic regression for univariate and multivariate analyses to assess the strength of the association with hepatocellular carcinoma. Age, sex, albumin, bilirubin, AFP, DCP, and AFP-L3 were considered for inclusion in the multivariable models. Complete data for the GALAD score were available for more than 95% of cases. Patients with missing data were dropped from the statistical analysis. Factors such as symptoms and performance score were excluded to limit subjectivity in the model. A log transformation was made to AFP and DCP due to extreme skewness. Logistic regression analyses were based on a complete case analysis using a parsimonious forward–backward stepwise approach, keeping variables significant at the 1% level but that also increase AUROC. Fractional polynomials (35) were also used to investigate whether a more sophisticated transformation than the log transformation could improve the prediction.
Model accuracy is presented as sensitivity, specificity, proportion of false positives and negatives, and overall percentage of correct predictions. Having developed the model as described above, three cutoff points for classifying patients to the hepatocellular carcinoma group were used. The first optimized for maximum sensitivity while maintaining a prespecified specificity, the second for maximum specificity while maintaining a prespecified sensitivity, and the third for the maximum of the sum of specificity and sensitivity. Patients can then be classified by the model as being predicted to have hepatocellular carcinoma or not and this can then be directly compared against true diagnosis.
Model validation is carried out on independent data sets in which again patients were diagnosed as hepatocellular carcinoma or CLD. The predictive score for each patient, based on fitted models, is used to classify patients as having hepatocellular carcinoma or not, and this is then directly compared against true diagnosis.
Of the 331 patients with hepatocellular carcinoma, 283 (85.5%) had clear evidence of associated CLD, 37 (11.2%) seemed to have no underlying benign liver disease, and in 11 (3.3%) the presence or absence of underlying CLD could not be ascertained with certainty. The corresponding figures for the CLD group were 96%, 2.6%, and 1.8%, respectively. Among all data considered in the derived statistical models, data completeness was greater than 98%.
The median values for log(AFP), log(DCP), and AFP-L3 were significantly higher in the patients with hepatocellular carcinoma than in those with CLD, and both groups had median values higher than those for healthy control subjects (Fig. 1). All three biomarkers showed considerable discriminatory ability for distinguishing between hepatocellular carcinoma and CLD (AUROCs: log(AFP) 0.88, AFP-L3 0.84, and log(DCP) 0.90; Fig. 2A).
The optimal model (model 1), built on the discovery data set, included log(DCP), log(AFP), and AFP-L3, as well as age and sex, and had an AUC of 0.97. The Child–Pugh score was included in the logistic regression analysis but proved not to be a significant factor in the models. The model utility was maintained irrespective of the Child–Pugh class. Application of functional polynomials leads to a model incorporating AFP-L3, DCP(–0.5), and AFP(0.5) (model 2).
Table 2 shows the estimated coefficients (SE) and OR (95% confidence intervals, CI) from the univariate analyses as well as from the multivariate analysis using the discovery set data based on model 1. Table 3 shows the true positives/negatives, false positives/negatives, sensitivity, specificity, and proportion correctly classified when the multivariate model is used on the discovery data set (discovery) and on the validation data set (internal validation). For the validation set, AUROC = 0.98. Also shown are subsets of the results for the fractional polynomial (model 2).
Because the validation results were extremely supportive of the model, the discovery and validation data sets (subsequently referred to as “Birmingham data”) were combined and a new model was fitted (model 3).
For this model, AUROC = 0.97; parameter estimates (SE) and OR (95% CI) for the model variables are shown in Table 2. When the model was used on the Newcastle data set (external validation), AUROC = 0.95. A fractional polynomial model was also fitted (model 4).
Given the homogeneity of the Newcastle and Birmingham data, a final model based on all these data was found:
(model 5), where Pr(HCC) = exp(Z)/(1 + exp(Z)) is the probability of hepatocellular carcinoma in a patient.
Table 3 shows the performance of models 1 and 2 on the Birmingham discovery and internal validation set, models 3 and 4 on all the Birmingham data and the external validation set, and model 5 on the total Birmingham and Newcastle data. Comparative figures are presented for models when maximized for either sensitivity and specificity or both in the discovery set, and then used in the validation sets. For example, using model 1, maximum sensitivity for specificity of 80% was achieved at a cutoff point of −1.58 and resulted in figures of 97% sensitivity in the discovery set and 98% sensitivity/74% specificity in the internal validation set. The corresponding cutoff point was −1.55 for the whole of the Birmingham data, leading to 97% sensitivity in the Birmingham data and 98% sensitivity/62% specificity in the external validation set. Note that the application of fractional polynomials did not improve the results.
The overall percentage of patients classified as having early disease varied widely depending on the definition applied (Table 4). Using BCLC, the percentage with early stage was 10%; with TNM 6 the percentage was 52%, 51% based on tumor size and 20% based on potentially curative therapies after multidisciplinary team review. Table 4 shows how model 5 performs on the early and late groups according to different classifications. Figure 2B and C show AUROC curves showing the performance of model 3 for the early and late groups (defined in terms of TNM 6) compared with controls.
We have developed a model, now referred to as “GALAD,” that generates a figure for the probability of an individual patient with CLD having hepatocellular carcinoma. The physician can then determine at what level of likelihood further investigation, in the form of CT or MRI scanning, should be instituted. The model performance was only slightly poorer for early compared with late hepatocellular carcinoma however these terms were defined.
AFP used on its own has been the most widely used biomarker for hepatocellular carcinoma but fluctuating, albeit low levels in patients with CLD (without hepatocellular carcinoma) and low sensitivity in patients with early hepatocellular carcinoma have resulted in the recommendation that it should not be used in the screening setting, although this view remains contentious (1, 2, 8, 20, 36–38). The development of the AFP-L3 assay has increased the sensitivity and specificity of AFP (39) because it retains significant discriminatory ability even at low levels of total AFP (13, 40–43). The recent development of highly sensitive AFP-L3 using an automated microfluidic based assay has made the isoform even more sensitive and specific (13, 34). Because AFP, high-sensitivity AFP-L3, and DCP can now be measured on a single platform (44), the risk figure could be routinely reported. AFP-L3 and DCP have been approved by the Food and Drug Administration (FDA) in the United States and European Medicines Agency (EMEA) in Europe for the diagnosis of hepatocellular carcinoma. In Japan, all three markers are approved by the Japanese FDA (Korou-sho).
These markers have been investigated in Western populations using a study design similar to that applied here (45). Using receiver operating characteristic (ROC) analysis, Durazo and colleagues (45) determined optimal cutoff points for sensitivity, specificity, and positive predictive values, and concluded that DCP had optimal performance and that combining the markers did not achieve an additional predictive value to differentiate patients with hepatocellular carcinoma from those without hepatocellular carcinoma. In contrast, Carr and colleagues (46) considered that the combination of all three markers was superior to the individual although, unlike in the present study, formal statistical models were not proposed (47). The study most closely aligned to ours is that reported by Marrero and colleagues (47), in that a large number of Western patients were included, the disease etiologies were broadly similar, and account was taken of disease stage. Because overall the test performances will be affected by the methods of disease staging (discussed below), and Marrero and colleagues used a cut off point for individual markers rather than, as in our study, an overall model, direct comparison of results is not possible.
The three markers have also been studied in patients with cirrhosis or CLD who were followed-up and in whom hepatocellular carcinoma developed, although in neither study was biomarker validation the primary objective (45, 48). Two of the markers were investigated in patients in the Halt C trial using a nested case–control design. This involved 37 patients with hepatocellular carcinoma and 79 control subjects, all with HCV infection in the setting of a prospective study. The authors concluded that at the time of diagnosis, the combination provided a sensitivity of 91% and a specificity of 71% (at defined cutoff points; refs. 45, 48). In a second large prospective study, Kumada and colleagues (49) reported a series in which 623 patients with HCV-related CLD were prospectively followed-up and showed clearly that increased levels of AFP and AFP-L3 were closely associated with an increased incidence of hepatocellular carcinoma. On this basis, they suggested that patients with ≥10 ng/mL AFP levels or AFP-L3 ≥5% should receive intensive imaging at 3 to 6 month intervals. Our study included a third biomarker AFP-L3, a discovery, and two validation sets. The external validation set was not ideal because the cohort was derived from patients with fatty liver disease (alcoholic and nonalcoholic) and, as such, had a different spectrum of etiologies to that on which the model was developed and validated. Ultimately, the model will require validation on larger external data sets with differing etiologies.
All our patients had a minimum follow-up of 6 months to exclude occult hepatocellular carcinoma in the CLD cohort. We did not focus entirely on patients with cirrhosis because the risk of hepatocellular carcinoma is associated with CLD rather than just at the stage of cirrhosis, and we aimed to set our study as close to a real clinical situation as possible (36).
Precise estimation of sensitivity and specificity demands a rigid “gold standard,” and it is apparent that no definitive diagnosis of hepatocellular carcinoma or CLD without hepatocellular carcinoma is available. Thus, although current guidelines give figures of 85% and >95% for sensitivity and specificity, respectively, radiological diagnosis by CT or MRI imaging is still recognized to be “not infallible” (1, 2), especially in small and hypovascular tumors (50), Similarly, it is conceivable that a serological test might detect hepatocellular carcinoma in a cirrhotic liver before it is detectable on MRI scanning, thus resulting in the positive serological test being regarded as “false,” and decreasing its apparent sensitivity. It would be surprising, therefore, if any new (in this case serologically based) test could achieve 100% sensitivity and specificity; if the three patients in the control group who developed hepatocellular carcinoma between 6 and 12 months into the study had been reclassified as being in the hepatocellular carcinoma group, the results of our study would have been even better.
The role of surveillance in the early detection of hepatocellular carcinoma is widely accepted, and it has been estimated that approximately 70% of patients who are detected when the lesion is <5 cm or has three tumors each less than 5 cm can receive potentially curative therapy (1, 2, 51). The optimal approach to surveillance remains contentious, some arguing that ultrasound alone should be the primary procedure, others arguing that ultrasound should be combined with AFP estimation (6, 52, 53).
Our study does not determine how such a model will perform in a prospective screening setting or in a routine practice outside specialist units, but we believe that these results are sufficiently encouraging to warrant a prospective study of this model to be run in parallel with conventional staging with USS with the aim of supplanting or, more likely, enhancing USS as an effective screening procedure for hepatocellular carcinoma.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Conception and design: P.J. Johnson, D. Palmer, S. Hussain, H. Reeves, S. Satomura
Development of methodology: P.J. Johnson, S.J. Pirrie, T.F. Cox, M. Teng, S. Hussain, S. Satomura
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): P.J. Johnson, S. Berhane, M. Teng, D. Palmer, J. Morse, G. Patman, S. Hussain, H. Reeves
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): P.J. Johnson, S.J. Pirrie, T.F. Cox, S. Berhane, M. Teng, D. Palmer, S. Hussain, J. H. Reeves, S. Satomura
Writing, review, and/or revision of the manuscript: P.J. Johnson, S.J. Pirrie, T.F. Cox, D. Palmer, S. Hussain, H. Reeves, S. Satomura
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): P.J. Johnson, S.J. Pirrie, M. Teng, D. Palmer, J. Morse, D. Hull, C. Kagebayashi, J. Graham, H. Reeves, S. Satomura
Study supervision: P.J. Johnson, H. Reeves, S. Satomura
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The authors thank the Experimental Cancer Medicine Centre and Biomedical Research Unit, University of Birmingham, Birmingham, UK. The authors also thank colleagues in the Liver Unit, Cancer Centre and Chemical Pathology at University Hospitals Birmingham NHS Foundation Trust for their help in management of patients involved in this study.
- Received August 28, 2013.
- Revision received October 21, 2013.
- Accepted October 25, 2013.
- ©2013 American Association for Cancer Research.