
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Minireview |
Cancer Biomarkers Research Group, Division of Cancer Prevention, National Cancer Institute, NIH, Bethesda, Maryland
Requests for reprints: Sudhir Srivastava, Cancer Biomarkers Research Group, Division of Cancer Prevention, National Cancer Institute, NIH, Executive Plaza North, Room 3142, 6130 Executive Boulevard, MSC 7346, Bethesda, MD 20892-7346. Phone: 301-496-3983; Fax: 301-402-8990. E-mail: ss1a{at}nih.gov
| Abstract |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
Cancer research has reached a strategic inflection point, enabling researchers to generate a wealth of critical data. The current challenge is to understand how these data and related technologies can be applied to clinical use. Cancer biomarkers must therefore be critically evaluated for their clinical applications. Several recent papers have described various aspects of the biomarker development process (2-4). For the purpose of this article, "validation" refers to the confirmation of accuracy, reproducibility, and precision or effectiveness of biomarkers in detecting the intended end points [preneoplastic lesions, incidence, etc. (5)]. However, the approaches for validating biomarkers have yet to be addressed clearly. In an effort to delineate the ambiguities related to biomarker validation and related statistical considerations, the National Cancer Institute (NCI), in collaboration with the FDA, conducted a workshop in July 2004 entitled "Research Strategies, Study Designs, and Statistical Approaches to Biomarker Validation for Cancer Diagnosis and Detection."
Experts from the statistical, epidemiologic, and clinical communities deliberated on current statistical designs and discussed approaches to biomarker validation for cancer diagnosis and detection in a 2-day workshop. This article summarizes the discussions, critically evaluates the existing approaches, and provides the recommendations of the participants.
| Performance Metrics of Biomarkers |
|---|
|
|
|---|
When validating early detection biomarkers as triggers of early intervention or for the evaluation of cancer screening, it is necessary to estimate benefits and adverse events (e.g., unnecessary biopsies) in a particular study or a screening program (10). It is important to use mortality related to the disease as the end point rather than overall mortality in assessing the efficacy of the marker. The most commonly used epidemiologic approaches to measure performance characteristics are observational studies and case-control studies. In observational studies, the results suffer from self-selection bias; that is, subjects who receive screening have differing risks of cancer from those not tested. Case-control studies and mathematical models that combine variables from different sources increase the potential for self-selection bias, whereas periodic screening evaluation and a paired availability design decrease this potential (10-14). These approaches are further biased by high-throughput, high-dimensional genomic and proteomic dataa potential source of candidate biomarkers. This has shifted the original concept of "one marker-one disease" to "multiple markers-one disease." A new approach therefore is warranted to avoid chance (something that happens unpredictably without discernible human intention or observable cause), bias, and overfitting, that is, to incorporate multivariable correlations in a statistical model.
When investigating the performance of a classification rule based on multiple markers, the problem of overfitting can be avoided by selecting a classification rule based on a training sample and estimating marker performance in a separate test sample. Because of noise associated with high-dimensional data, the top few features that differ between cases and controls should be identified up front and classification rules for combinations of these features should be investigated. For example, one could consider reestimating the top 20 features (genomic and/or proteomic) in another application versus the top 20 features that work together as joint behaviors.
Dr. Martin McIntosh (Fred Hutchinson Cancer Research Center, Seattle, WA) suggested that many approaches may be available to estimate classifiers that satisfy optimal criteria. But the choice of method should depend on practical needs, such as modest sample sizes and the need to control for observational biases. For this reason, he favors using logistic regression or other semiparametric binary regression approaches among the theoretically justifiable methods. In practice, discovery and validation platforms are rarely applicable in the clinic for monitoring the joint behavior of markers; thus, one needs to consider the need for optimizing a marker panel early on during discovery or validation. For example, the proteins that work best together when measured by matrix-assisted laser desorption ionization are not the same that will work best together when measured by an ELISA. Because of this, suggestions were made that researchers consider the entire development process to recognize the risks inherent in rejecting a marker in the discovery phase for one with inferior performance when considered alone solely because it may work well with other markers selected. Furthermore, criteria to help identify biomarkers, such as investigating their function and performance in subgroups based on stage, histology, and survival, must be developed.
Whenever decisions must be made about which markers to advance to subsequent study phases in the selection and validation processes, biology should drive the choice because, statistically, there is no justification to optimize the complex methods when the function of these markers and methods may change downstream. Small changes may have large downstream effects; if a selection decision is made too early, key markers may be overlooked. For instance, five genes that change subtly may make more difference than one gene that undergoes dramatic alteration. Moreover, the scale of a particular change is less important than the biological significance of the change. However, a large sample size is necessary to correctly identify multiple markers that work together rather than find chance combinations that apparently do well. In preliminary performance studies, the longitudinal behavior of markers in controls should be investigated to determine marker stability over time, as this characteristic is a good indicator for the retrospective performance phase. Final performance evaluation should be based on an external independent sample.
| Strengths and Weaknesses of Longitudinal and Cohort-Based Designs: A "Piggybacking" Approach through Treatment and/or Prevention Trials |
|---|
|
|
|---|
Recommendations were made for piggybacking biomarkers already proven for their accuracy and performance characteristics in preceding phases of randomized trials. The control groups from the large trials can serve as longitudinal cohorts to get a prospective assessment of the relationship with biomarkers of subsequent disease. It is possible to use permutation tests to compare intervention groups with respect to multiple biomarker changes observed between groups. Biomarker validation backed up by randomized trials will yield more convincing results, remove bias, and balance predictive factors (known and unknown). Factorial designs have also been proposed to piggyback biomarkers on trials with one or more endpoints in one single study. For example, in the Physicians' Health Study in which 22,000 physicians were evaluated for the effects of aspirin on mortality due to cardiovascular disease and ß-carotene on cancer incidence. Using 2 x 2 factorial designs, it was possible to assign physicians to one of four groups taking aspirin placebo, aspirin plus ß-carotene placebo, aspirin placebo plus ß-carotene, or aspirin plus ß-carotene. The aspirin component of the study was terminated early because of significant reduction in myocardial infarctions, whereas the ß-carotene study was continued for unaffected results by aspirin. This approach allows addressing two separate questions relating to entirely different diseases in a single study and yields unaffected study results for both of the study objectives (15). Some additional specific examples illustrating the piggyback approach are provided below.
In the first example, it was noted the Prostate Cancer Prevention Trial, in which 18,882 men were randomized to finasteride or placebo for 7 years and required to undergo biopsy at the end of the study, offered unique research opportunities. These included independent confirmation of disease progression, markers that predict recurrence or progression early, existent funded clinical trial infrastructure, collected covariates, and relatively little competition for samples, thereby providing a unique cohort for validating new biomarkers. By using prostate-specific antigen profiles in the Prostate Cancer Prevention Trial and recurrence times from 1,011 patients treated >7 years, prostate-specific antigen was determined to be an early predictor of recurrence of prostate cancer via a study design that had no verification bias (16, 17). Yet, there is clearly a need for analytic methods to optimize inference that is subject to design constraints. Moreover, in a piggyback approach the relationships should be made as early as possible to provide a strong justification for a particular study, embed correlative studies into the design phase, and assemble analytic techniques.
A second example is the Women's Health Initiative, a randomized trial designed to study the effects of dietary modification, hormone therapy, calcium, and vitamin D on disease outcomes in >160,000 women initiated in 1992. Initially, the study was designed to use the specimens to explain intervention effects in the randomized clinical trial, examine disease mechanisms, identify/confirm biological risk factors, develop risk strata, and describe the natural history of disease biomarkers. It was noted that the study offered unique cohorts for a variety of biomarker validation with well-defined specimens (type, collection times, and volumes), outcomes (adequate time to accumulate a sufficient number of events and quality of outcome data), study population (relevance of biomarkers to a specific population), clinical practice (availability of screening and complementarity of a biomarker to existing modalities), and consent and Health Insurance Portability and Accountability Act authorization offer. In addition, 26 studies have been approved that use Women's Health Initiative blood specimens, and 17 of these feature principal investigators who are not Women's Health Initiative investigators.
| Trials for Biomarker Validation |
|---|
|
|
|---|
The importance of modeling and how cancer biomarkers may be used as auxiliary variables in a seamless phase II/III trial design was also discussed. Dr. Don Berry (M. D. Anderson Cancer Center, Houston, TX) stated that conventional drug development paradigms contain a lag time of 9 to 12 months between phases II and III. However, with biomarkers as auxiliary variables, a drug-versus-placebo phase II study carried out at select centers, each enrolling, say, 10 to 20 patients monthly, could significantly reduce this lag time. Thus, if predictive probabilities of biomarkers are encouraging in phase II, the trial can be expanded to phase III, carried out at many centers that enroll higher numbers, say, >40 patients monthly. In a single trial, survival data from both phases can be combined in the final analysis. Frequent updating of data evidence allows judgments about accrual and continuation of the trial. Such an adaptive design allows fewer patients to be enrolled, enables a smooth transition between phases II and III, and uses data from all patients to assess phase II end point and the relationship between the biomarker and survival (19, 20).
Dr. Don Berry noted further that, when using longitudinal markers (e.g., CA-125 in ovarian cancer), data available from the trial are used to model the relationship over time between the biomarker and survival, depending on therapy. By calculating predictive distributions for each patient and using covariates, the seamless phase II/III model can be applied. Such an approach enables key decisions in trial design, including adding or discontinuing study arms or changing doses. However, it has been suggested that, to use the auxiliary variables nonparametrically, a strong relationship between variables is necessary to yield useful results from intermediate information and final outcomes.
Using flexible genomic drug trial design scenarios for the purpose of population stratification, Dr. Sue Jane Wang (FDA, Rockville, MD) discussed how genomic biomarkers can be developed for drug response. She proposed five phases of genomic biomarker development for drug response. Briefly, the genomic biomarkers are explored for disease severity detection followed by clinical assay and validation of established disease severities in early phases of genomic biomarker development. In mid-phases, genomic biomarkers that detect drug toxicity or drug response should be identified through retrospective (longitudinal) evaluation. Clinical confirmation of its predictability is to be assessed preferably through prospective pharmacogenomic test screening. The final phase is to quantify the effect of the pharmacogenomic diagnostic screening test that reduces the burden of disease on the population via therapeutic/diagnostic treatment intervention.
Dr. Richard Simon (NCI, Bethesda, MD) described the elements and value of proper cross-validation in the evaluation of biomarker indices and indicated that cross-validation is valid only if the test set is not used in any way to develop the model. With proper cross-validation, the model is developed from scratch for each "leave-one-out" training set (21-23). He noted that, for smaller studies, cross-validation is preferable to split-sample validation; internal validation is limited by the precision in the estimated error rate and the data used for the developmental study. Whenever working with high-dimensional data, samples should be split between validation and development of markers.
The group noted that each phase of biomarker development requires validation, but true confirmation of patient benefit is established in phase V, a large randomized trial to determine the effect of a test on mortality, as described in the phases of biomarker development by the Early Detection Research Network of NCI (4). Because of the expense, however, phase IV, a prospective study in which a biomarker test that triggers a workup test should be carried out, and phase V can be completed only for a select few biomarkers. When validating via a simulation, it was noted that a model can never replace the observation. True validation means that results may be reproduced by other laboratories in other settings and in independent populations. Thus, banking specimens for sharing is critical.
| Clinical and Biological Challenges: Biological Specimens from Large Institutional Trials |
|---|
|
|
|---|
| Considerations for Biomarker Validation Regulatory Requirements for Commercialization |
|---|
|
|
|---|
To evaluate the performance of a biomarker, the FDA prefers a "yardstick of truth," such as analysis with receiver operating characteristic curves and the ability to understand biomarker behavior in intended populations. It has been suggested that industry should have a clear idea of the intended use of the biomarker, shown by a good study design, analytic and data collection methods, and the demonstration of good science. The agency in turn should develop clear guidelines to approve cancer diagnostics (e.g., analyte-specific reagent definitions and criteria for classification and showing clinical use of biomarkers that affect therapeutic decision making). It was recommended that industry should engage in early dialogue with the FDA for developing clear roadmaps with necessary regulatory emphasis and priorities on various test developments (e.g., multiplex testing, genomics, quality standards and benchmarks, and combination product guidelines for device/drug or device/biological applications).
| Summary |
|---|
|
|
|---|
| Acknowledgments |
|---|
| Footnotes |
|---|
Received 6/13/05; revised 2/14/06; accepted 3/29/06.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
J. Cummings, C. Hodgkinson, R. Odedra, P. Sini, S. P. Heaton, K. E. Mundt, T. H. Ward, R. W. Wilkinson, J. Growcott, A. Hughes, et al. Preclinical evaluation of M30 and M65 ELISAs as biomarkers of drug induced tumor cell death and antitumor activity Mol. Cancer Ther., March 1, 2008; 7(3): 455 - 463. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Kuller Is Phenomenology the Best Approach to Health Research? Am. J. Epidemiol., November 15, 2007; 166(10): 1109 - 1115. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. B. Seligson, F. Hongo, S. Huerta-Yepez, Y. Mizutani, T. Miki, H. Yu, S. Horvath, D. Chia, L. Goodglick, and B. Bonavida Expression of X-Linked Inhibitor of Apoptosis Protein Is a Strong Predictor of Human Prostate Cancer Recurrence Clin. Cancer Res., October 15, 2007; 13(20): 6056 - 6063. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. K. Roy and J. D. Khandekar Biomarkers for the Early Detection of Cancer: An Inflammatory Concept Arch Intern Med, September 24, 2007; 167(17): 1822 - 1824. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Cancer Research | Clinical Cancer Research |
| Cancer Epidemiology Biomarkers & Prevention | Molecular Cancer Therapeutics |
| Molecular Cancer Research | Cancer Prevention Research |
| Cancer Prevention Journals Portal | Cancer Reviews Online |
| Annual Meeting Education Book | Meeting Abstracts Online |