| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
1 Princess Margaret Hospital; 2 Ontario Cancer Institute, Toronto, Ontario, Canada; 3 Harvard Medical School; 4 Harvard School of Public Health; 5 Massachusetts General Hospital; and 6 Dana Farber Cancer Institute, Boston, Massachusetts
Requests for reprints: Geoffrey Liu, Princess Margaret Hospital/Ontario Cancer Institute, Suite 7-124, 610 University Avenue, Toronto, Ontario, Canada M5G 2M9. Phone: 416-946-4501, ext. 3428; Fax: 416-946-6546. E-mail: Geoffrey.Liu{at}uhn.on.ca
| Abstract |
|---|
|
|
|---|
Methods: We developed an explicit algorithm (set of standard operating procedures forming a rapid outcomes ascertainment system) that encompassed multiple tests of quality assurance, and quality of data for a range of prognostic and outcomes variables, in several cancers, across several centers and two countries were assessed. Based on these assessments, the algorithm was revised and physicians' clinical practice changed. We reevaluated the quality of outcomes after these revisions.
Results: Development of an algorithm with internal quality controls showed specific patterns of data collection errors, which were fixable. Although the major discrepancy rate in retrospective data collection was low (0.6%) when compared with external validated sources, complete data were found in <50% of patients for treatment response rate, toxicity, and documentation of patient palliative symptoms. Prospective data collection and changes to clinical practice led to significantly improved data quality. Complete data on response rate increased from 45% to 76% (P = 0.01, Fisher's exact test), for toxicity data, from 26% to 56% (P = 0.02), and for palliative symptoms, from 25% to 70% (P < 0.05), in one large lung cancer case-control study.
Conclusions: Observational studies can be a useful source for studying molecular prognostic and pharmacogenetic outcomes. A rapid outcomes ascertainment system with strict ongoing quality control measures is an excellent means of monitoring key variables. (Cancer Epidemiol Biomarkers Prev 2008;17(1):204–11)
| Introduction |
|---|
|
|
|---|
Despite these advantages, carrying out outcomes analyses on the foundation of a case-control study has potential pitfalls. Prognostic and risk factors for the same disease do not overlap completely (13), so that there may be missing data on some clinically important prognostic variables. Unlike randomized control studies, case-control studies are observational studies; their outcomes are typically collected retrospectively. Patients are not treated uniformly. Whereas population-based samples are a standard in the case-control setting (14), the same population-based samples may yield a heterogeneous group inappropriate for survival analyses of the entire data set.
Between 1999 and 2001, the Harvard Lung Cancer Susceptibility Case-Control Study, which began in 1992, began to address the issue of conducting a clinical outcomes study within the case-control framework. We developed a rapid outcomes ascertainment system (ROAS) to allow us to collect these outcomes. ROAS involves a set of specific algorithmic procedures that incorporate multiple quality assurance tests. Thus, this Harvard study also enabled us to compare differences between a typical case-outcomes retrospective approach to data collection and ROAS. We hypothesized that case-control studies can yield high-quality outcomes analyses if properly monitored and conducted using a system similar to ROAS. After application of ROAS to Harvard Lung Cancer Study, we also applied ROAS to separate esophageal and pancreatic cancer studies at two institutions, and recently we started implementation of ROAS in a third institution. We hypothesized that the lessons learned from the Harvard Lung Cancer Study were generalizable to new tumor sites and new institutions, even in new health care systems of another country.
| Materials and Methods |
|---|
|
|
|---|
The Dana-Farber Harvard Cancer Center studies of esophageal and pancreatic studies involved not only MGH but also recruitment at Dana Farber Cancer Institute (G. Liu, D. Christiani, and M. Kulke, co-principal investigators); these were prospective studies developed under ROAS. The esophageal cancer study at Princess Margaret Hospital in Toronto (G. Liu, principal investigator) is currently implementing ROAS procedures.
Outcomes Data Collection Timeline
In 2000 to 2001, the feasibility of evaluating outcomes in this study was assessed. Because >80% were non–small-cell lung cancers, we limited the assessment of outcomes to this subgroup. The timeline included literature search (step 1), January to June 2000; feasibility pilot (step 2), June to September 2000; the development of standard operating procedures (ROAS) for collecting clinical prognostic and outcomes variables (step 3), August 2000 through June 2001; and ROAS quality control assessments (steps 4-6), March 2001 through January 2002. Prognostic and outcomes data collection using ROAS began in earnest in June 2002 and continues to the present.
Identification of Important Clinical Prognostic Factors and Outcomes (Step 1). PubMed7 search used the search terms "lung cancer" and "prognosis." To avoid inclusion of outdated prognostic variables, we limited the search to 1990 through 1999. Articles were restricted to "English language," "core clinical journals," and "with available abstracts." Through a separate search, we compiled phase II and III studies in the same period to identify the important clinical outcomes variables.
Pilot Feasibility Study (Step 2). In 1999, the feasibility of collecting outcomes data from this case-control study was uncertain. To assess feasibility, a small number (n = 40) of non–small-cell lung cancer patients with early-stage disease and a small cohort (n = 40) with advanced-stage disease were randomly selected. A basic qualitative assessment of feasibility (e.g., were charts available?, etc.) was done.
Development of Algorithm for Data Collection (Step 3). We developed standard operating procedures for prognostic and outcomes data collection in the form of an algorithm incorporating two internal quality control measures (Fig. 1 and steps 4-7). In summary, research assistants involved in data abstraction underwent initial training. Data were abstracted from a range of sources: hospital computerized and paper patient records, Social Security Death Index, referring physicians notes, and death certificates. If required, patients/patients' families were approached, but only when data could not be obtained from the sources listed above. Abstracted data were subsequently computerized. The algorithm incorporated two internal quality control measures (step 4) and validation with external "gold standard" sources (steps 5 and 6). A key element to this algorithm was that it was accompanied by a paper trail (a procedure manual), so that historical procedural details were always documented. This algorithm ensured a consistently high level of quality control even when multiple individuals collected outcomes data.
|
Adequacy of Follow-up and Outcomes (Step 5). In an observational study, both the patient and treating physician control the frequency and completeness of follow-up; treating physicians control the documentation of key outcomes. We assessed the quality of these processes. A panel of local oncology experts first established definitions of what constituted complete, substandard, and missing prognostic and outcomes variables. We used the standard Response Evaluation Criteria in Solid Tumors for outcomes data (16) and the National Cancer Institute Common Toxicity Criteria version 2.0 for grading toxicity.
Cross-Validation with External Sources (Step 6). A random sample of the prognostic and outcomes data was cross-checked against secondary sources of information: (a) an independent oncologist blinded to the algorithm results, using all available information and not confined to the algorithm protocol (R.S.H. and G.L.); (b) for patients concurrently recruited to a clinical trial, shadow charts of the clinical trials coordinators; and (c) the MGH Cancer Registry. Our comparisons were categorized as follows: (a) identical results between algorithmic and secondary sources; (b) minor discrepancies in documentation between the algorithmic and secondary sources that would not affect most outcomes analyses; (c) major discrepancies or missing data that could materially alter analytic results between the algorithmic and secondary sources.
Changing Clinical Practice (Step 7). Results from steps 4 to 6 were presented to clinicians. Roundtable discussions were held to discuss methods of improving the data collection process. After changes were made to the algorithm or to clinical practice and a sufficient time had elapsed, steps 4 to 7 were repeated.
Statistical Analysis. The majority of analysis consists of descriptive analyses and tabulations. Where appropriate, Fisher's exact tests were done, comparing categories of data accuracy or completeness at different time points of data collection.
| Results |
|---|
|
|
|---|
Steps 3 and 4. Figure 1 shows the algorithm developed for collecting prognostic and outcomes data. We chose 15 cases per year from each of the years 1993 to 1999 for review (n = 105). There were significantly more missing data in 1993 and 1994 because the older charts were not systematically stored off-site. Complete data on overall survival and disease-free survival/progression-free survival were found in MGH computerized records in 67 (64%) cases; an additional 13 (12%) cases required data from MGH physicians offices; 3 (3%) cases also required contact of referring or primary care physicians and 2 (2%) cases required contact of families or patients. Twenty cases had missing data. Of these 20 cases, an additional 9 cases had overall survival data obtained from either the social security death index or death certificates. A data entry error rate of 1.4% was reduced to 0.4% after we modified the database. After assessment of the algorithm by an oncologist, specific patterns of errors were found. First, the date of first clinic visit was mistakenly used to calculate overall survival, rather than the actual date of diagnosis. Second, some of the apparently missing toxicity data were found in nursing notes. A written procedure manual improved the efficiency of the data abstraction process, reducing the mean time per case review by 15 min (60-45 min).
Step 5. The quality of data was assessed on key clinical prognostic variables. Data were graded as complete, substandard, or missing based on definitions developed by an oncology expert panel (Table 1 ; R.S.H., G.L., J.T., L.S., T.J.L., P.F., and M.H.K.). For example, overall survival data were deemed complete in stage I to II non–small-cell lung cancer if vital status was known to the most recent 6 months or we were able to obtain at least 4 years of follow-up data (or until first event).
|
|
|
Step 7. Quality control measures were instituted and a repeat review of data quality was undertaken after a sufficient period of time was given to implement changes. Table 2 lists the major changes instituted between 2001 and 2006. When we compared the quality of data collected over time, there was substantial improvement in the quality of overall survival, progression-free survival, response rate, and toxicity for late-stage patients undergoing chemotherapy (Fig. 2B) and in the description of symptoms in palliative patients receiving best supportive care.
|
An esophageal case control study was recently initiated at Princess Margaret Hospital, Canada (G. Liu, principal investigator). Results from the first 15 patients in ROAS showed good quality data for stage and performance status (87% and 93% complete, respectively; n = 15). Data on chemotherapy toxicity again were of poor quality (complete toxicity data in 27% of cases).
| Discussion |
|---|
|
|
|---|
Case-control studies are established for risk analysis (14). To undertake pharmacogenetic or molecular prognostic research using a case-series design derived from a case-control study, data on prognostic factors must be captured. However, prognostic factors are not synonymous with disease risk factors and thus may not have been collected as part of the original case-control study; retrospective collection of missing prognostic variables becomes necessary. Unlike risk studies, where the goal is to capture every case, outcomes studies may only analyze specific subgroups based on a common disease stage or treatments. These fundamental differences in study design may affect the quality and range of outcomes research that can be undertaken from an initial case-control study.
This study also highlights the difficulty both of standardizing retrospective data collection procedures and of using retrospective data derived from standard clinical practice. Data were missing or substandard for key prognostic variables including performance status, toxicity, and disease symptoms. Results from this research have led to changes in the conduct of clinical practice, which we outlined in Table 2. In addition, we learned that quality must be evaluated on a continuous basis. In many case-control studies, data cleaning occurs toward the end of the data collection period, but for outcomes analysis, data cleaning should be ongoing. For example, we discovered that changes in clinic personnel had a considerable effect on the quality of data; thus, all new personnel (including physicians-in-training) receive instruction on how to use the clinical note templates. Outside the scope of this study, but just as important, is the need to carry out ongoing quality control measures for the biological samples collected.
Changes made to our data collection procedures have increased the range of pharmacogenetic research that can be undertaken using the case-control study. We now publish on toxicity, response rate, and progression-free survival in addition to overall survival, which would not have been possible before the institution of our quality assurance mechanisms (20, 21). In addition, we have successfully adapted this algorithm for use in parallel esophageal and pancreatic case-series studies at MGH. Similar procedures are being used in case-control and case-series studies at Dana-Farber Cancer Institute and at Princess Margaret Hospital.
There are limitations to applying the results of this study to other case-control studies seeking to carry out outcomes analyses. Although we evaluated ROAS in multiple tumor sites across three institutions and two countries, all the centers were large institutions with computerized order entry, results of diagnostic tests, and treatment data, which are essential elements to improving data quality. The plan for outcomes collection occurred early in each study, and early consultation with clinicians helped to push many of the procedural changes into the patient clinics quickly. However, even when these enhanced practice guidelines are ingrained into clinical practice, we still struggle to ensure that toxicity and performance status templates attached to every clinical note are properly filled out and with every visit; we have devised random checks of the quality of these data and report back to clinicians on their individual performances. Whereas these specific issues may not be relevant to every center, the importance of critically evaluating data collection techniques and data quality is relevant to all molecular prognostic and pharmacogenetic studies.
One concern is whether the dedication of time and resources to establishing ROAS was worth the outcome, given that the majority of variables were of at least moderately good quality. We believe that it is worthwhile. The resources required for implementation were actually much lower each time we implemented it at a new center, tumor site, or institution. We found that the quality issues were similar across sites and centers, as were the solutions (see Table 2 and Figs. 2 and 3). This helped us to anticipate problems and focus resources to the right places early. As expected, efficiency was much higher if a problem was anticipated and fixed early, rather than several years later.
Understanding the role of molecular cancer prognostic factors is of great importance. Promising results must be translated into the clinical setting to enable tailored treatment approaches. Large case-control studies provide a ready source of patients and information for such case-series analyses. However, attention must be paid to the fundamental differences between the two study types. Our experience presented here shows that it is possible to use a case-control study for outcomes analysis; however, the implementation of standardized procedures (such as ROAS) and ongoing monitoring of data quality is key to the success of this approach.
| Acknowledgments |
|---|
| Footnotes |
|---|
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Received 5/23/07; revised 9/20/07; accepted 10/22/07.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
D. W. Cescon, P. A. Bradbury, K. Asomaning, J. Hopkins, R. Zhai, W. Zhou, Z. Wang, M. Kulke, L. Su, C. Ma, et al. p53 Arg72Pro and MDM2 T309G Polymorphisms, Histology, and Esophageal Cancer Prognosis Clin. Cancer Res., May 1, 2009; 15(9): 3103 - 3109. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Cancer Research | Clinical Cancer Research |
| Cancer Epidemiology Biomarkers & Prevention | Molecular Cancer Therapeutics |
| Molecular Cancer Research | Cancer Prevention Research |
| Cancer Prevention Journals Portal | Cancer Reviews Online |
| Annual Meeting Education Book | Meeting Abstracts Online |