
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
1 Department of Family and Preventive Medicine, University of California at San Diego, San Diego, California and 2 Department of Epidemiology and Surveillance, American Cancer Society, Atlanta, Georgia
Requests for reprints: James D. Knoke, Tobacco Control Policies Project, University of California at San Diego, Suite 310, 1545 Hotel Circle South, San Diego, CA 92108. Phone: 619-294-3708; Fax: 619-220-0228. E-mail: jknoke{at}ucsd.edu
| Abstract |
|---|
|
|
|---|
2 goodness-of-fit tests. Results: Examination of the residuals of a model proposed by Doll and Peto with the Cancer Prevention Study I data suggested that a better fitting model might be obtained by including an additional term specifying the ages when smoking exposure occurred. An extended model with terms for cigarettes smoked per day, duration of smoking, and attained age was found to fit statistically significantly better than the Doll and Peto model (P < 0.001) and to fit well in an absolute sense (goodness-of-fit; P = 0.34). Finally, a model proposed by Moolgavkar was examined and found not to fit as well as the extended model, although it included similar terms (goodness-of-fit; P = 0.007). Conclusions: The addition of age, or another measure of the timing of the exposure to smoking, improves the prediction of lung cancer mortality with Doll and Peto's multiplicative power model. | Introduction |
|---|
|
|
|---|
In the present report, we use data from the American Cancer Society's Cancer Prevention Study I (CPS-I; ref. 6) to develop new parameters for these models. The CPS-I has considerably more deaths from lung cancer than the British Doctors Study and contains age of initiation of smoking as well as attained age. Using the CPS-I data, we then perform a statistical comparison of the Doll and Peto model, the Moolgavkar model, and an extended model with an additional term for attained age.
| Materials and Methods |
|---|
|
|
|---|
2, 4, and 6 years and at the conclusion of follow-up. Mortality was additionally assessed at 1, 3, 5, and 11 years. To parallel the British Doctors Study, the primary study group for this report was the subgroup of white, male, current cigarette smokers ages 40 to 79, who initiated smoking by age 35 and did not also smoke cigars or pipes. Age, age at initiation of cigarette smoking (initiation), and dose (CPD) were assessed by questionnaire at the baseline visit, and it was these values that were used in our analyses. Initiation and dose were categorized on the questionnaire, as detailed in Table 1. Descriptive analyses of the mortality data collected by this study have been presented previously (7-9).
|
95% of deaths. Lung cancer listed as the primary, secondary (contributing), or tertiary (any mention) cause of death on the death certificate was the outcome for our analyses. There were 163,643 white males who continued to smoke throughout their period of observation and had complete data on age, initiation, dose, and outcome. Of these, 3,405 subjects died of lung cancer during follow-up. Those continuing smokers ages 40 to 79 during some portion of the follow-up period, who initiated smoking by age 35, were tabulated into 1,200 cells (40 x 6 x 5, age x initiation x dose). This tabulation advanced age and duration with year of follow-up and censored subjects when they died of causes other than known lung cancer or were lost to follow-up. Intensity of smoking (dose) as reported at the baseline evaluation was assumed to continue throughout follow-up.
We also studied a secondary group of 92,307 subjects who were white, male, and had never smoked any tobacco product to assess the rate of lung cancer in the absence of tobacco smoking. Of these, 215 died of lung cancer during follow-up. The nonsmoker subgroup has been studied before, with some variations, by Whittemore (9), Garfinkel and Silverberg (10), Burns et al. (8), Thun et al. (11), and Leenhouts (12). Whittemore reported fewer deaths, so may have used only lung cancer as the primary cause of death in her analysis. Leenhouts used only the first 6 years of follow-up of the CPS-I.
The Doll and Peto Model
The Armitage and Doll (3) multistage model of carcinogenesis implies that a cell progresses to malignancy through a sequence of states of a Markov chain. As applied to the effect of cigarette smoking on lung cancer risk, it assumes that the number of events occurring in each cell is Poisson distributed with incidence (mean value) a specialized multiplicative power function of duration and dose. Specifically, the incidence is a function of three parameters a, b, and c:
![]() |
![]() |
The constant 6 accounts for the background risk of lung cancer in the absence of smoking, and the constant 3.5 accounts for the time lag between a single cell first becoming a cancer and death from subsequent growth of that cancer. Doll and Peto considered models with constants other than 6 and 3.5 and reported that the fit to these alternative models was similar that of model A, although the estimated parameters took on different values. Consequently, these constants are somewhat arbitrary. In particular, a value greater than 3.5 may be more appropriate for the second constant (9). In this article, however, model A is always used with the constants 6 and 3.5 to allow comparison with the original Doll and Peto analysis. Notice that, by definition, this model is intended to be applied to smokers; nonsmokers have duration of zero; thus, the third term would be negative 3.5 raised to a power.
The Moolgavkar Model
Moolgavkar et al. proposed an alternate biological model for carcinogenesis (14, 15) and fitted this "two-stage" model to observed lung cancer death rates using the British doctors' data (5). Like the Doll and Peto model, Moolgavkar's model assumes that events in each cell follow the Poisson distribution, although the incidence is a complicated nonlinear function of age, initiation, and dose. One feature of Moolgavkar's model is the assumption that the adverse health consequences of a fixed intensity of smoking for a fixed period are less for smokers under age 20 than for those over age 20. This is based on the presumption that the number of susceptible lung tissue cells continues to increase until
age 20 and then remains constant. The incidence function is:
![]() |
Moolgavkar described the parameters in terms of the first-mutation and second-mutation rates (c0 + c1d and c0 + c2d) per lung tissue cell per year of smoking exposure and the net proliferation rate of intermediate cells (a + bd). Due to the nonlinear incidence, none of the five parameters are uniquely associated with an individual explanatory variable (age, initiation, or dose). However, Moolgavkar refers to b, c1, and c2 as dose-related parameters because of their multiplicative relation to dose. When fit to the British doctors' data (with nonsmokers included in the analysis), Moolgavkar's initial estimate of the parameter b was a small negative value; consequently, he set b to zero. Further analysis by Moolgavkar showed that a similar fit to the data could be obtained by setting any one of the three dose-related parameters, b, c1, or c2, to zero or by setting c1 = c2. These results suggest that the original, five-parameter model may be overparameterized.
Extended Models
The effects of a given intensity and duration of smoking may vary depending on the age at which smoking occurs. It has been suggested that the lung may be either more (16) or less (5) susceptible to a given carcinogenic exposure early in life. An increased susceptibility to carcinogenic exposures with advancing age has also been postulated (17, 18). We tested whether the addition of a term for age of initiation or attained age to the incidence function of the Doll and Peto model leads to a better fitting model. While age of initiation, attained age, and duration of exposure can all be postulated to have independent biological effects, it is not possible to include all three terms in a mathematical model because including any two fixes the value for the third. We examined all three combinations of these terms in the models presented below. When any two terms are used in the model, they specify the duration of smoking and where that duration occurred in the life span of the smoker:
![]() |
![]() |
![]() |
A Poisson Model for Nonsmokers
This model assumes that the number of events occurring in each 1-year age interval is Poisson distributed with incidence a simple power function of age:
![]() |
Statistical Methods
The Doll and Peto and extended models were fit using PROC GENMOD of the SAS System (Cary, North Carolina), which evaluated maximum likelihood estimates of the model parameters and likelihood ratio tests of nested models. The likelihood ratio tests evaluated whether a reduced model fits the data as well as a specific alternative model with additional parameters (19). The Moolgavkar model was fit with PROC NLIN of the SAS following the approach described by Jennrich and Ralston (20) for obtaining maximum likelihood estimates. Confidence intervals for the parameters were estimated by the Wald approximation (19).
2 goodness-of-fit tests of models were performed on a reduced number of combined cells (353 rather than 1,200) to meet the minimum expected cell frequencies suggested by Cochran (21) for goodness-of-fit testing. The algorithm used for cell combination is described in Appendix 1. The goodness-of-fit test assessed the overall fit of the data to the model; small P values for the test indicated that the data do not fit the model well. These overall tests are absolute and not relative to another model as are the likelihood ratio tests.
Graphical residual analysis employing standardized residuals, the signed square roots of the contributions to
2 for the cells, was also performed using the combined cells. The residual value for a combined cell was associated with the weighed averages of initiation and dose, with weights equal to the expected frequency for the original cell divided by the sum of the expected frequencies for all cells combined. For each residual plot, the linear regression coefficient was tested for nonzero slope.
| Results |
|---|
|
|
|---|
|
Lung Cancer among CPS-I Nonsmokers
The lung cancer mortality rate among nonsmokers in the CPS-I is presented in Fig. 1 as a function of 5-year intervals of age. Model F closely fit the mortality rate, with parameter estimates â = 5.29 x 1013 and
= 4.83 and goodness-of-fit
2 = 15.71 (df = 25, P = 0.92). The estimate of the parameter b is greater than that reported by Whittemore (9) for a similar model. However, Whittemore subtracted 5 years instead of 3.5 from age and included fewer deaths in her analysis.
|
2 statistic for duration (
2 = 1,952.3) is more than five times the value of that for dose (
2 = 373.3), which is consistent with Peto's (13) observation that duration has the more profound effect on lung cancer risk. Analyses of residuals, however, indicated that the fit of model A was lacking in both dimensions of age of initiation and attained age (Fig. 2).
|
|
2 goodness-of-fit tests for models A and C to E, using the 353 combined cells described in Appendix 1, suggested (Table 3) that the extended models fit the data rather well in an absolute sense, while model A did not fit well. Although models C to E could not be compared with each other by formal statistical hypothesis testing, we did add an interaction (cross-product) term to each (results not shown). There was a significant interaction between duration and initiation in model C. The interactions were not significant between duration and age in model D or between age and initiation in model E. We selected model D for further analysis based on the likelihood ratio tests, the goodness-of-fit tests, and the logical coherence of a model with parameters for dose, duration, and attained age. This model is subsequently called the extended D-P model. The residual plots with respect to age of initiation and attained age for model D are presented in Fig. 3 and are essentially flat.
|
The reduced, four-parameter model converged to a negative estimate of c2 that, although small, was statistically significantly less than zero (Table 4). In addition, there were substantial negative correlations, 0.84 between c0 and c1 and 0.63 between c1 and c2. The negative estimates and large negative correlations suggested that the four-parameter model was still overparameterized. We then considered three-parameter models and obtained reasonable estimates with either c1 or c2 set to zero. The model with c1 set equal to c2 fitted the CPS-I data more poorly than either of the other three-parameter models. We arbitrarily reported the model with c2 = 0, because this version exhibited slightly better fit than that with c1 = 0.
|
2 goodness-of-fit tests suggested that the three-parameter Moolgavkar model fitted the data as least as well as the four-parameter model, better than the Doll and Peto model, not as well as the extended D-P model, and not well in an absolute sense. Residual analysis indicated that, similarly to the Doll and Peto model, the fit of the three-parameter model was lacking primarily in the dimension of age of initiation (Fig. 4), although age of initiation was a term in the model.
|
| Discussion |
|---|
|
|
|---|
A recent report by Flanders et al. (25), using data from the American Cancer Society's Cancer Prevention Study II, fitted two-parameter (duration and intensity), simple Poisson models stratified by decade of attained age. They concluded that their results confirmed Peto's (13) observation that duration of smoking is more important than intensity (CPD) in predicting lung cancer. They also found that the estimated coefficients for both duration and intensity decreased with increasing age. Thus, their two-parameter, stratified modeling approach using Cancer Prevention Study II data is generally consistent with our conclusion, using CPS-I data, that age is an important third parameter in predicting lung cancer risk.
That the estimated coefficient for age in the extended D-P model is (significantly) positive suggests that a given duration of smoking might be more hazardous when experienced later in life. There are several possible explanations for this observation. There may be an accumulation of exposure to other lung carcinogens as is apparent in nonsmokers, and there may be an increased susceptibility to carcinogenic exposure with advancing age. Another possibility, suggested by Moolgavkar et al. (5), is that the young might be at lower risk because they have fewer lung tissue cells at risk. Conversely, there is evidence that a given exposure to carcinogens may be more damaging when received at a younger age as demonstrated by chromosome loss at 3p21 (16). These explanations cannot be differentiated by an analysis of the CPS-I data because the inclusion of any two of the three age/duration terms fixes the third.
A single value for dose is unlikely to adequately characterize the lifetime intensity of smoking. Cross-sectional surveys have shown that CPD is not constant; it increases from initiation to
age 30 and increases more slowly to
age 50 and then declines (26, 27). This phenomenon would result in CPD reported at older ages underestimating the lifetime smoking exposure compared with CPD reported at midlife. The lower than predicted risk of smoking for earlier ages of initiation thus may be due to the pattern of smoking intensity with age in addition to the fewer lung cells to be exposed as Moolgavkar has suggested. It might also account for the apparent contradiction present for younger ages of initiation between the increased susceptibility to molecular change from carcinogenic effects and the observed lower lung cancer incidence.
Examination of the residuals and
2 goodness-of-fit tests suggested that the Moolgavkar model also did not fit the CPS-I data as well as did the extended D-P model. The lack of fit of the three-parameter Moolgavkar model appeared largely due to how the model incorporates age of initiation. The Moolgavkar model's assumption of constant risk after age 20 may be too young of an age for the transition if CPD actually rises to age 30. Our results confirmed the observation of Moolgavkar et al. (5) that the term for net proliferation rate of intermediate cells is not affected by dose. However, our additional observations that only one of the dose-related parameters (c1, c2, and b) is estimable with the CPS-I data and that the overall fit is not especially good suggest that the two-stage biological model underlying Moolgavkar's mathematical model may not well describe these data.
There are apparent differences between the British and the American populations similar to the previously observed differences between British and Japanese populations (24). For the Doll and Peto model, estimates of both parameters b and c were smaller for the American population than for the British (the confidence intervals for b did not overlap). The estimates for the three-parameter Moolgavkar model also differed between the two populations.
The differences found in how smoking affects lung cancer risk between the British and the American populations are not unexpected. Neither the British Doctors Study nor the CPS-I were probability samples. The British doctors were enrolled about 8 years before the CPS-I participants were enrolled. There were demographic, environmental, and dietary differences between the two populations. In addition, there were differences in the composition of cigarettes consumed in the two countries (28). Subsequently, there have been changes in the demographics of smokers (29) and additional changes to the composition of cigarettes (28), suggesting that if either study were repeated today, there may well be differences in resulting estimates of model parameters.
We included continuing smokers who reported consuming an excess of 40 CPD; such heavy smokers have been excluded from previous analyses of the British doctors' data. While there were relatively few such heavy smokers in our data, we included them because this did not have a deleterious effect on model fit as assessed by either the residual or the goodness-of-fit analyses. When the 40+ CPD smokers were excluded from Doll and Peto model estimation, the parameter for duration was little changed (3.79 instead of 3.74). The parameter for dose increased slightly (1.20 instead of 0.96); however, its confidence interval (1.05-1.34) still did not overlap the confidence interval for dose parameter with the British data (1.44-2.33).
Previous reports have not agreed on whether nonsmokers should be included when modeling the effect of smoking on the risk of lung cancer. The Doll and Peto and extended D-P models, by definition, do not extend to nonsmokers, while the Moolgavkar model does. Our analyses show that the inclusion of nonsmokers in the Moolgavkar model has only a modest, although statistically significant, effect on the parameter estimates compared with the estimates when only smokers are included. This report, like the cited previous reports, does not attempt to model risk among former smokers. Likely, more complicated models than studied here will be required to effectively model risk among this important and growing population subgroup (30).
In conclusion, our analysis of the CPS-I data shows that adding a third parameter, a measure of where in the life span exposure to smoking occurs, to the Doll and Peto model, in addition to terms for dose and duration of smoking, improves the accuracy of model prediction. This confirms the importance of Moolgavkar's inclusion of two age-related terms but not how age is incorporated in his nonlinear incidence function. Although the underlying biological phenomenology for the importance of two age-related terms is conjectural at this time, the extended D-P model appears useful for future projections of the health consequences of cigarette smoking.
| Appendix 1 |
|---|
|
|
|---|
2 statistic. These criteria say that the minimum expected cell frequency should be one, and most cells (80%) should have expected cell frequencies of at least 5. The algorithm for combining cells was chosen to the meet the Cochran criteria for the extended D-P model D and to not bias the results. The combined cells are detailed in Appendix Table 1. The youngest ages had the sparsest data and all subjects ages 40 to 45 were combined into one cell for each year of age. Those ages 46 to 47 were combined into two cells for each year of age and so forth through those ages 52 to 53 being combined into eight cells. The intermediate ages, 54 to 74, had the most data and were combined into 13 cells for each year of age. Those ages 75 to 79 again had sparser data and were combined into a decreasing number of cells as age increased. In general, those who initiated smoking (init) between ages 15 and 19 and smoked one or more packs of cigarettes per day had the most data. Those who initiated between 10 and 14 or between 20 and 24 had the next most data. For the intermediate ages and ages of initiation, there were several cells for which no combination was necessary for those who smoked one or more packs of cigarettes per day.
The minimum expected cell frequency for all models was at least 1. For model D, 285 cells (80.7%) had expected cell frequencies greater than 5. The Cochran criteria were not quite met for the other models. Only 259 cells (73.4%) of the Doll and Peto model A had expected cell frequencies greater than 5. For the four-parameter Moolgavkar model, 76.3% of the cells had expected cell frequencies of at least 5; for the three-parameter Moolgavkar model, 79.6% of the cells had expected cell frequencies of at least 5.
| A.1Appendix Table 1: Definition of the reduced number of cells for goodness-of-fit testing |
|---|
|
|
|---|
|
| Footnotes |
|---|
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Received 7/16/03; revised 1/ 9/04; accepted 2/ 2/04.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
W. K Al-Delaimy, J. P Pierce, K. Messer, M. M White, D. R Trinidad, and E. A Gilpin The California Tobacco Control Program's effect on adult smokers: (2) Daily cigarette consumption levels Tob. Control, April 1, 2007; 16(2): 91 - 95. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. T Streppel, H. C Boshuizen, M. C Ocke, F. J Kok, and D. Kromhout Mortality and life expectancy in relation to long-term cigarette, cigar and pipe smoking: The Zutphen Study Tob. Control, April 1, 2007; 16(2): 107 - 113. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. H. Lubin and N. E. Caporaso Cigarette smoking and lung cancer: modeling total exposure and intensity. Cancer Epidemiol. Biomarkers Prev., March 1, 2006; 15(3): 517 - 523. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. D. Hazelton, M. S. Clements, and S. H. Moolgavkar Multistage Carcinogenesis and Lung Cancer Mortality in Three Cohorts Cancer Epidemiol. Biomarkers Prev., May 1, 2005; 14(5): 1171 - 1181. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Cancer Research | Clinical Cancer Research |
| Cancer Epidemiology Biomarkers & Prevention | Molecular Cancer Therapeutics |
| Molecular Cancer Research | Cancer Prevention Research |
| Cancer Prevention Journals Portal | Cancer Reviews Online |
| Annual Meeting Education Book | Cell Growth & Differentiation |