A group of metabolic cancer susceptibility genes exhibit fairly common germline polymorphisms, with frequencies ranging from a few percent to 50% in the general population. These genes encode enzymes that are involved in the metabolism of both exogenous and endogenous toxic substances. In terms of cancer susceptibility, the metabolic gene polymorphisms show a low penetrance and odds ratios for cancer generally on the order of 1.5–2.0. A large number of case-control studies have been conducted on the association between metabolic gene polymorphisms and cancer of various sites (1) ; however, most of the studies lack adequate statistical power to detect an association, if there is one, due to small sample size (2) . A meta-analysis of the published literature in the field confirmed the weak, but significant, associations of certain gene polymorphisms with specific cancer types (1) and also demonstrated the influence of ethnic and or geographic variations in frequencies of these polymorphisms. Potential difficulties with meta-analysis of published results are that not all results might be published and that it is not always possible to compare information obtained from published studies or to conduct stratified analyses. Because in meta-analysis data on individual subjects are not available, a number of potentially interesting issues cannot be addressed (3) .
We have begun a collaborative project to collect and analyze all of the available information on polymorphisms in the genes that metabolize environmental carcinogens, using original data both published and unpublished from laboratories all over the world that are active in this field. The genes to be studied in the first phase of this project include CYP1A1, GSTT1, GSTM1, NAT2, CYP2E1, and CYP2D6.
Investigators who had previously published case-control studies on genotype for one or more of the genes listed above and a cancer end point up to June 1997 were identified through MEDLINE and contacted by letter. We explained the purposes of the study and asked for published and unpublished original data from case-control studies on gene polymorphism and cancer. Each investigator was asked to provide all original data (without personal identifiers) on all subjects included in his/her study(s). An instruction sheet for submission of data with the variables required and the format was included. A short questionnaire was sent to each investigator to collect information on laboratory methods used to analyze the genotype, study design, response rate, and inclusion criteria. Of sixty-seven investigators contacted, 52 (78%) agreed to participate in the study and 42 (63%) have already sent their data. Data were entered in a data set using Excel after being re-coded in a standard fashion, when necessary. Quality control and logical controls were performed. Gene polymorphisms were all coded according to a standard nomenclature system (4) , and submitted data were converted to this system, when necessary. Investigators were asked for explanations of coded data, including genotype data, when necessary. Hard copies of the entered data were sent back to the investigators for further checks. Several other variables not included in the original request were received. These data were included in a separate file, and a list of them was prepared for future use. An Advisory Committee was appointed to help define policies for handling and confidentiality of data and to establish methods for data analysis, data publication, authorship and so on.
The first phase of data collection was concluded on September 1, 1998. As of that date, records from 72 studies for a total of 19,148 subjects, 9,352 controls, and 9,796 cases were received. Half of the studies were conducted using hospital controls, half using healthy subjects. Table 1<$REFLINK> shows the distribution of subject cases by cancer site. Basic demographic information, such as age, sex, and race, as well as data on smoking status, were available for most of the subjects. The most frequently analyzed gene in the data set was GSTM1 (73% of controls, 79% of cases), followed by CYP1A1 (56% and 60%, respectively; Table 2<$REFLINK> ). In a substantial number of subjects, around 25%, two or three gene polymorphisms were tested simultaneously. Four genes were tested in 17% of the controls and 23% of the cases. A smaller fraction of the data included information on five or six genes tested simultaneously in the same subject.
The collected data are available to all investigators who submitted data, who may submit proposals for use of the data set to test hypotheses and analyze the data with respect to particular genes, cancers, and other factors. The Advisory Committee reviews the proposals and creates working groups according to common interests. Each investigator who contributes data used in a particular analysis will be given the opportunity to participate in the analysis and will be included among the authors, regardless of their participation in the analysis. Several such analyses are currently in progress. Regular reviews of the published literature are performed every 6 months, and authors of new publications are contacted and asked to furnish their data to maintain this data set as an ongoing system of data collection. In the future, additional genes may be added to the data set as new research directions warrant.
This is the first effort to pool individual data from a large number of epidemiological studies involving metabolic gene polymorphisms as markers of cancer susceptibility. This area of research has produced numerous publications, but many of the results are inconclusive for several reasons, among which are the weak association between each metabolic gene polymorphism and cancer, the small sample size of the studies, and the ethnic heterogeneity of the populations included. Studies on large numbers of cases and testing of multiple gene polymorphisms at once are necessary to better understand the role of metabolic genes in cancer development. This study is an attempt to create a large data set, including several thousand cancer cases and controls from all over the world, where several genes have been tested. The data set can then be used to answer simple questions, such as the level of association of a specific gene polymorphism with a specific cancer, as well as more complex issues such as interactions between different genetic factors and between genetic and other factors (5) . In addition, we will compare the results of our pooled analysis with those obtained by classical meta-analysis, to assess the determinants of participation in our project, and possible bias in publication of data (6) . One of the major aims of this project is to continue to increase the number of subjects, genes, and other relevant information to expand the number of testable hypotheses. This pooled data analysis project should prove useful not only to confirm associations between gene polymorphisms and cancer that have been suggested in the past, but also and mainly to suggest new working hypotheses related to individual susceptibility to the carcinogenic effects of environmental agents.
Distribution of cancer cases in the data set by site
Distribution of genetic polymorphisms available in the data set
Acknowledgments
We thank Samantha Garbers, Giovanna Bognandi, and Cinzia Petrazzoli for technical assistance.
Footnotes
-
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
-
↵1 Advisory Committee Members: P. Boffetta (Lyon, France), N. Caporaso (Bethesda, MD), J. Cuzick (London, United Kingdom), S. Garte (Piscataway, NJ), L. Le Marchand (Honolulu, HI), S. J. London (Research Triangle Park, NC), N. Rothman (Bethesda, MD), A. Piazza (Torino, Italy), P. Vineis (Torino, Italy), L. Tomatis (Trieste, Italy).
-
↵2 Supported by the European Commission Fund (number 96/CAN/33919).
-
↵3 On behalf of the Collaborative Group on Genetic Susceptibility to Environmental Carcinogens. Current members of the Collaborative Group: A. K. Alexandrie (Stockholm, Sweden); C. B. Ambrosone (Jefferson, AR); S. Anttila (Helsinki, Finland); H. Baranova (Clermont-Ferrand, France); E. Bartsch (Heidelberg, Germany); S. Benhamou (Villejuif, France); K. Breskvar (Ljubljana, Slovenia); J. Brockmoller (Berlin, Germany); I. Cascorbi (Berlin, Germany); G. Chenevix Trench (Brisbane, Australia); M. L. Clapper (Philadelphia, PA); V. Dolzan (Ljubljana, Slovenia); C. M. Dresler (Philadelphia, PA); J. Ford (New York, NY); V. Gaborieou (Lyon, France); C. Harris (Bethesda, MD); A. Haugen (Oslo, Norway); D. W. Hein (Louisville, KY); A. Hirvonen (Helsinki, Finland); L. Hsieh (Tao-Yuan, Taiwan); M. Ingelman-Sundberg (Stockholm, Sweden); N. Jourenkova (Villejuif, France); F. F. Kadlubar (Jefferson, AR); S. Kato (Tokyo, Japan); T. Katoh (Kitakyushu, Japan); M. Kihara (Yokohama, Japan); P. Kremers (Liège, Belgium); G. W. Lucier (Research Triangle Park, NC); N. Malats (Lyon, France); S. Morita (Osaka, Japan); T. Nakajima (Nagano, Japan); V. Nazar-Stewart (Pittsburgh, PA); D. W. Nebert (Cincinnati, OH); K. Noda (Yokohama, Japan); T. Nurminen (Helsinki, Finland); Y. Oda (Kanazawa-Ishikawa, Japan); I. Persson (Stockholm, Sweden); A. Rannug (Stockholm, Sweden); T. R. Rebbeck (Philadelphia, PA); A. Risch (Heidelberg, Germany); L. Roelandt (Liège, Belgium); M. Romkes (Pittsburgh, PA); I. Roots (Berlin, Germany); D. Ryberg (Oslo, Norway); J. Seidgard (Lund, Sweden); P. Shields (Bethesda, MD); E. Sim (Heidelberg, Germany); R. C. Strange (Stoke-On-Trent, United Kingdom); I. Stucker (Villejuif, France); H. Sugimura (Shizuoka, Japan); J. To-Figueras (Barcelona, Spain); H. Vainio (Helsinki, Finland); M. J. Watanabe (Takizawa, Japan).
-
↵4 To whom requests for reprints should be addressed, at Ospedale Maggiore IRCCS, Epidemiology Unit, Via F. Sforza, 28, 20122 Milano, Italy. Phone: 011-39-02-55038247; Fax: 011-39-2-55038413.
- Accepted May 18, 1999.
- Received March 15, 1999.
- Revision received May 10, 1999.