
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Division of Thoracic Surgery, Brigham and Womens Hospital, Harvard Medical School, Boston, Massachusetts 02115
| Abstract |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
169,500 men and women were diagnosed in the United States with lung cancer and that 157,400 died from this malignancy.3
Approximately 80% of lung cancer patients have NSCLC4
, a histological category of primary lung cancer that includes adenocarcinoma, the most common form of NSCLC, as well as squamous cell carcinoma and large cell carcinoma. All patients diagnosed with NSCLC, regardless of subtype, are evaluated in a uniform manner and offered treatment based on their stage at presentation. Up to 34,000 lung cancer patients in the United States present with stage I or II disease and are amenable to effective surgical treatment (1)
. These patients with early-stage NSCLC enjoy up to 70% 5-year survival after surgery alone. Patients with stage IIIA NSCLC usually undergo neoadjuvant chemotherapy followed by surgery or radiation therapy. Patients with stage IIIB or IV lung cancer are usually only candidates for palliative chemotherapy and radiation therapy (2)
. A prognostic test for early-stage lung adenocarcinoma is likely to improve survival by identifying patients who are more likely to recur and may, therefore, benefit from adjuvant or neoadjuvant therapy. A number of prognostic markers and pathological characteristics of resected lung cancer have been reported to correlate with outcome. These include the degree of differentiation, tumor size, lymphovascular invasion, and the expression of certain markers such as p53 and K-ras (3) . However, none have been validated for routine clinical use.
Gene expression profiling using microarrays has been used to successfully predict disease related outcome in multiple cancers (4, 5, 6, 7, 8, 9, 10, 11) . Recently, several groups have published the results of gene profiling with microarrays of large cohorts of lung adenocarcinoma tissues linked to clinical outcome data (11, 12, 13) . These studies propose prognostic models based on analysis of adenocarcinoma samples for which unique genetic profiles are correlated with treatment-related outcome. Unfortunately, these models are difficult to assess clinically because they rely on measuring expression levels of a relatively large numbers of genes using costly data acquisition platforms (i.e., microarrays) and sophisticated algorithms/software. In addition, clinical use of the models is hindered by the inability to analyze a sample independently without reference to other samples. However, these studies do support the hypothesis that in addition to stage, specific features of the tumors gene expression may be used to predict, with high accuracy, the clinical outcome of the patient in response to a specific therapy.
We recently described a method for translating expression profiling data into clinically relevant tests using ratios of gene expression (14) . We have successfully used this method to develop tests for the differential diagnosis of lung adenocarcinoma and mesothelioma (14) , the prognosis of mesothelioma (15) , and the diagnosis of prostate cancer.5 Development of specific tests is based on an initial supervised comparison of gene expression data between two groups that differ with respect to a chosen clinical characteristic. Although fundamentally similar to linear discriminant analysis (16) , expression ratio-based models are statistically robust and offer several clear advantages that better facilitate the transition to clinical use (15) . This method can be used to create clinical tests that are platform independent and require only small amounts of RNA and, by extension, tumor tissue. Here, we report the application of this method in developing a prognostic test for patients with resected stage I lung adenocarcinoma. The tissue specificity of the test was also assessed by applying it to expression profiling data of stage I breast adenocarcinoma.
| Materials and Methods |
|---|
|
|
|---|
7000 genes (11)
. Gene expression data for the test set of samples were obtained using Affymetrix high-density oligonucleotide microarrays (U95A chip) with probe sets representing
12,000 genes (12)
. Gene expression data for resected stage I (lymph node-negative) breast cancer tissues were obtained from a single source using microarrays containing
25,000 genes (10)
.
Data and Statistical Analysis.6
The selection of predictor genes for use in expression ratio-based prognosis was performed essentially as described (14
, 15)
. We identified prognostic genes that were discriminatory between two subsets of training set samples: those from patients considered to be cured from cancer by surgery (i.e., good outcome, survival >48 months, n = 25) and those from patients who died from recurrent cancer (i.e., poor outcome, survival < 48 months, n = 11). Using a two-sided Students (parametric) t test for pairwise comparisons of average gene expression levels, we first searched all of the genes represented on the microarray for those with a statistically significant
2-fold difference in average expression levels between good outcome and poor outcome tumors. To minimize the effects of background noise, the list of distinguishing genes was additionally refined by requiring that the mean expression level be >600 in at least one of the two sample subsets. We chose for final analysis those genes that fit the filtering criteria and were also represented on the expression profiling platform of the test set of samples. For the analysis of test set samples, we again defined good (n = 30) and poor (n = 16) outcome as survival >48 and <48 months, respectively. For determination of classification accuracy only, we also designated as poor prognosis those patients still alive but with recurrent cancer within 48 months of surgery because patient status data were available for test set samples. Of the seven prognostic genes identified in the training set (Table 1)
, a total of two were represented by multiple Affymetrix probe sets on the expression-profiling platform of the test set (LocusLink ID): APOE and TRB@. Uninformative expression data were removed from consideration by excluding those probe sets for which average expression levels of all test set samples were <50 and that were not called "Present" for a majority of samples. Two probe sets were excluded (32795_at, and 31449_at), and the remainder were averaged for each of the two genes to give a final expression value for each gene in all samples. Data from three highly accurate gene expression ratios were combined by calculating the geometric mean, (R1R2R3)1/3, where Ri represents a single ratio value. This is equivalent to the average of [log2(R1), log2(R2), log2(R3)] and has the effect of giving equal weight to ratio fold-changes of identical magnitude but opposite direction. Kaplan-Meier analysis was used to estimate survival where only known death events were uncensored. The log-rank test was used to statistically assess differences among multiple survival curves. The classification accuracy of the model in the test set of samples was assessed using Fishers exact test (i.e., 2 x 2 contingency table). All differences were determined to be statistically significant if P < 0.05. All calculations and statistical comparisons were generated using S-Plus (17)
.
|
| Results |
|---|
|
|
|---|
90% accurate overall and also
90% accurate within each subset of training set samples (i.e., good and poor outcome). There were clearly multiple three-ratio combinations that were highly accurate in classifying training set samples. One of these tests (APOE/S100P, LPIN2/SLC2A1, and LPIN2/MST1R) used the three most accurate single ratios that individually identified >80% of the training set samples and for this reason was predicted to ultimately represent the optimal test.
|
|
|
|
Sensitivity of Expression Ratios as Outcome Predictors in Lung Adenocarcinoma.
We originally identified prognostic genes from samples containing >70% tumor cellularity (11)
. To test outcome predictor models using these genes in an independent group of samples, we initially only considered test set samples with relatively high tumor cell content to ensure equal comparability between training and test sample sets. However, it would be beneficial for any clinical test to demonstrate adequate sensitivity when samples with relatively low tumor cell content are assayed. Therefore, we determined, using the optimal three-ratio test from above, the classification accuracy in the 14 stage I test set samples with low tumor cell content (<40%, minimum 10%) originally excluded from consideration. This test resulted in 64% (9 of 14) of the samples correctly identified with 67% (6 of 9) and 60% (3 of 5) classification accuracy in the good and poor prognosis subsets, respectively. Although classification accuracy was moderately better than that expected by chance alone, our predictions were not statistically significant for this small cohort (P = 0.58, Fishers exact test).
Prognosis of Other Adenocarcinomas Using Expression Ratios Developed in Lung as Outcome Predictors.
We next examined whether the prognostic test was specific for lung adenocarcinoma or could be applicable to other types of adenocarcinoma. We hypothesized that adenocarcinomas of both lung and breast origin would exhibit some degree of overlap in prognosis-related gene expression profiles despite originating from different tissue types. To test this hypothesis, we performed Kaplan-Meier time-to-recurrence survival analysis in breast cancer using predictions made by the optimal three-ratio test developed in lung cancer (APOE/S100P, LPIN2/SLC2A1, and LPIN2/MST1R). For this study, we obtained expression profiling data for a total of 97 breast cancer samples originally used to develop a microarray-based predictor model that stratified patient samples into prognostic groups based on cancer recurrence (10)
. The Kaplan-Meier estimated median DFS was not reached for these 97 samples (Fig. 1C)
. We used the breast cancer microarray data to calculate the geometric means of the three ratios and used the value to assign each patient to a good or poor prognosis subset. We then compared these groups using Kaplan-Meier survival analysis and discovered that predictions made using the three-ratio lung cancer prognostic test were able to produce significantly different (P = 0.0417, log-rank test; Fig. 1D
) DFS curves in breast cancer. The classification accuracy of this model was 60% (58 of 97) and was determined by comparing ratio-based predictions to the known patient status pertaining to cancer recurrence (10)
. The median DFS was not reached for the subset predicted to have a more favorable outcome and was 3.3 years for the subset predicted to have a less favorable outcome.
| Discussion |
|---|
|
|
|---|
The observed classification accuracy of 74% for the optimal ratio combination is encouraging within the context of this limited proof-of-principle study because gene discovery conditions were constrained by the number of sequences represented on the expression profiling platform of the training set. We are currently evaluating more comprehensive microarray platforms for the identification of differentially expressed genes. Ultimately, the accuracy of an expression ratio-based prognostic algorithm will be substantially enhanced by combining the ratio test results with proven clinical prognostic variables in addition to patient-specific parameters. These latter factors would not necessarily be reflected in the expression signature and should add independent prognostic information to improve the accuracy of the ratio-based test.
The issue of identifying the minimal number of genes necessary for maximal accuracy deserves consideration. Our experience has shown that multiple ratio tests incorporating three ratios (i.e., up to six genes) are a good starting point for optimization. In the current study we examined all combinations of four ratios as well and, not surprisingly, found multiple four-ratio tests that were highly accurate in the training set. However, none of these tests were as accurate as the most accurate three-ratio tests in classifying test set samples (data not shown), suggesting that a three-ratio test may be optimal for ratio-based prognosis under these circumstances. In fact, increasing the number of genes included in a ratio-based model will ultimately prove detrimental because additional predictor genes are added with progressively less discriminating power as reflected in the increasing Ps obtained during the initial supervised analysis.
Our study may also indicate that accurate classification of lung cancer prognostic subsets using the gene ratio technique requires tumor samples with relatively high tumor cell content because we found that successful classification of low tumor cell content samples is only moderately better (64%) than that expected by chance alone. These findings are consistent with previous studies using alternative classification methods (11) . However, our results are not conclusive given the relatively small cohort size of low tumor cell content samples and the fact that all 12 classification errors in stage I samples with >40% tumor content were from samples with >60% tumor cell content and 9 of 12 (75%) were from samples with >70% tumor cell content.
Published studies seeking to identify new prognostic molecular markers in cancer have largely focused on comparing samples within a single tissue type. The ability of a prognostic test developed and validated in lung adenocarcinoma to significantly predict recurrence in breast adenocarcinoma is intriguing, but the potential implications of this finding are presently unknown. We did not observe any obvious trends in the expression levels of individual prognostic genes in misclassified breast cancer samples and are currently examining whether ratio-based tests created using breast cancer profiling data are likewise amenable to predicting prognosis in lung cancer. Still, these results support the hypothesis that survival-related similarities exist in the global expression patterns of both adenocarcinomas and specifically that some of these can be reflected in a simple test using five genes. Direct support for this idea can be found in the current study; the S100P gene we independently discovered in lung adenocarcinoma is also a known marker of tumor progression in breast cancer (18 , 19) . Furthermore, pathologists have previously described the degree of differentiation, lymphovascular invasion, and mucin production as independent prognostic markers within adenocarcinomas from a variety of tissue origins. Although the optimal predictor genes for any tumor may be tissue specific, discovery of prognostic markers suitable for use in both adenocarcinomas will reveal fundamental similarities in gene expression patterns within adenocarcinomas in general and perhaps lead to the discovery of potential therapeutic targets.
A total of five genes comprised the most accurate three-ratio test in this study: APOE; S100P; SLC2A1; LPIN2; and MST1R. Of these, only two (S100P and SLC2A1) were listed among the top survival genes in the initial analysis of the training dataset (11) , suggesting that multiple genes can be used to predict outcome in lung cancer, the choice of which is based on the particular model used in the analysis. Nevertheless, four of the five genes in our study have clearly documented diagnostic and prognostic implications in other cancers. APOE, a lipid homeostasis protein, is highly expressed in ovarian carcinoma (20 , 21) . Similar to the current study, S100P is also preferentially expressed in prostate tumors with relatively worse prognosis (i.e., hormone refractory tumors; Ref. 22 ). Abnormal expression of SLC2A1 (alias GLUT1) has been observed in a number of epithelial malignancies, including ovarian cancer, where protein expression levels are directly proportional to tumor aggressiveness (23) , consistent with our observation that higher levels of this gene are associated with worse prognosis in lung cancer. Finally, overexpression of MSTR1 (alias RON, a member of the MET proto-oncogene family) has been found to cause the formation of lung tumors with atypical phenotypes in vivo (24) .
In addition to dividing the stage I tumors in the test set into two subsets with statistically different survival, the ratio test also assigned the majority of stage II and all of the stage III specimens to the poor outcome subset. This could suggest that the genetic profile of stage I cancer likely to recur is similar to that of more advanced tumors with respect to the prognostic genes identified in this study. Because the cause of death from lung cancer is usually metastatic disease and patients with stage II and III lung cancer have progressively higher incidence of metastatic disease, it is reasonable to hypothesize that the test developed herein measures inherent tumor metastatic potential. This suggestion challenges the idea that distant tumors arise from relatively rare cells within the primary tumor that have metastatic potential and is supported by a recent study by Ramaswamy et al. (25) in multiple tumor types. One other explanation for the assignment of stage II and all of the stage III specimens to the poor outcome subset is that the selected ratio-based test reflects the inherent presence of micrometastatic nodal disease because stage II and III cancer by definition have involved lymph nodes. Ratio-based prognosis within stage II and stage III patients was not possible because too few samples were assigned to the good outcome group and because different types of additional treatments were given to many of these patients.
The identification of patients at higher risk for recurrence may not immediately improve survival in the absence of effective therapeutic options. To date, however, adjuvant chemotherapy has not proven effective in controlling metastatic disease after resection of the primary tumor. This issue is currently being examined in a cooperative group trial randomizing stage IB patients to adjuvant chemotherapy versus observation (CALGB 9633). One potentially effective clinical approach would be ratio-based testing of tumor specimens obtained by FNA before any intervention. Percutaneous trans-thoracic FNA of lung nodules is a safe and well-accepted diagnostic technique that has been applied to lesions as small as 810 mm (26) . Stage I patients predicted to have poor prognosis after FNA biopsy, for example, could be offered participation in clinical trials of neoadjuvant therapy using protocols proven useful in patients with stage IIIA lung cancer (27) . In fact, we have ongoing studies to determine the suitability of FNA biopsy material for analysis using gene expression ratios.7
| Footnotes |
|---|
1 This work was partly funded by grants to R. B. from the Brigham Surgical Group Foundation, The Milton Fund of Harvard Medical School, and the National Cancer Institute (CA-098501) and by a grant to G. J. G. from the Cancer Research and Prevention Foundation. ![]()
2 To whom requests for reprints should be addressed, at Brigham and Womens Hospital, Division of Thoracic Surgery, 75 Francis Street, Boston, MA 02115. Phone: (617) 732-8148, Fax: (617) 582-6171; E-mail: rbueno{at}partners.org ![]()
3 Internet address: http://www.cancer.org. ![]()
4 The abbreviations used are: NSCLC, non-small cell lung cancer; DFS, disease-free survival; FNA, fine needle aspiration. ![]()
5 R. Bueno, K. R. Loughlin, M. H. Powell, G. J. Gordon. (2003) A diagnostic test for prostate cancer from gene expression profiling data. J Urology. In Press. ![]()
6 Additional information and detailed methods for all analyses can be found at our websites: http://www.chestsurg.org and http://www.generatios.com. ![]()
7 R. Bueno, L. A. Deters, M. D. Nitz, B. C. Lieberman, G. J. Gordon. Differential diagnosis of solitary lung nodules using gene expression ratios, submitted for publication. ![]()
Received 1/17/03; revised 4/30/03; accepted 5/16/03.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
C. E. Reed, A. Graham, R. S. Hoda, A. Khoor, E. Garrett-Mayer, M. B. Wallace, and M. Mitas A simple two-gene prognostic model for adenocarcinoma of the lung J. Thorac. Cardiovasc. Surg., March 1, 2008; 135(3): 627 - 634. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. E. Larsen, S. J. Pavey, L. H. Passmore, R. V. Bowman, N. K. Hayward, and K. M. Fong Gene Expression Signature Predicts Recurrence in Lung Adenocarcinoma Clin. Cancer Res., May 15, 2007; 13(10): 2946 - 2954. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. E. Larsen, S. J. Pavey, L. H. Passmore, R. Bowman, B. E. Clarke, N. K. Hayward, and K. M. Fong Expression profiling defines a recurrence signature in lung squamous cell carcinoma Carcinogenesis, March 1, 2007; 28(3): 760 - 766. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. N. Hayes, S. Monti, G. Parmigiani, C. B. Gilks, K. Naoki, A. Bhattacharjee, M. A. Socinski, C. Perou, and M. Meyerson Gene Expression Profiling Reveals Reproducible Human Lung Adenocarcinoma Subtypes in Multiple Independent Patient Cohorts J. Clin. Oncol., November 1, 2006; 24(31): 5079 - 5090. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. J. Gordon, L. A. Deters, M. D. Nitz, B. C. Lieberman, B. Y. Yeap, and R. Bueno Differential diagnosis of solitary lung nodules with gene expression ratios J. Thorac. Cardiovasc. Surg., September 1, 2006; 132(3): 621 - 627. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. J. Gordon, G. N. Rockwell, P. A. Godfrey, R. V. Jensen, J. N. Glickman, B. Y. Yeap, W. G. Richards, D. J. Sugarbaker, and R. Bueno Validation of Genomics-Based Prognostic Tests in Malignant Pleural Mesothelioma Clin. Cancer Res., June 15, 2005; 11(12): 4406 - 4414. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. A. Granville and P. A. Dennis An Overview of Lung Cancer Genomics and Proteomics Am. J. Respir. Cell Mol. Biol., March 1, 2005; 32(3): 169 - 176. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Cancer Research | Clinical Cancer Research |
| Cancer Epidemiology Biomarkers & Prevention | Molecular Cancer Therapeutics |
| Molecular Cancer Research | Cancer Prevention Research |
| Cancer Prevention Journals Portal | Cancer Reviews Online |
| Annual Meeting Education Book | Meeting Abstracts Online |