Abstract
Period analysis has been shown to provide more uptodate estimates of cancer survival than traditional methods of survival analysis. There is, however, a tradeoff between uptodateness and precision of period survival estimates: increasing uptodateness by restricting the analysis to a relatively short period, such as the most recent calendar year, goes along with loss of precision. Recently, a modelbased approach was proposed, in which more precise period survival estimates for the most recent year can be obtained through modeling of survival trends within a recent 5year period. We assess possibilities to extend the time window used for modeling to come up with even more precise, but equally accurate and uptodate estimates of prognosis. Empirical evaluation using data from the Finnish Cancer Registry shows that extension of the time window to about 10 years provides, in most cases, as accurate results as using a 5year time window (whereas further extension may lead to considerably less accurate results in some cases). Using 10year time windows for modeling, SEs of survival estimates can be approximately halved compared with conventional period survival estimates for the most recent calendar year. Furthermore, we present a modification of the modeling approach, which allows extension to 10year time windows to be achieved without the need to include additional cohorts of patients diagnosed longer time ago and which provides similarly accurate survival estimates at comparable levels of precision in most cases. Our analyses indicate opportunities to further maximize benefits of modelbased period analysis of cancer survival. (Cancer Epidemiol Biomarkers Prev 2007;16(8):1675–81)
 cancer registry
 neoplasms
 prognosis
 statistical methods
 survival
Introduction
Period analysis, a new method of survival analysis introduced 10 years ago (1), has been shown to provide more uptodate cancer survival estimates than traditional methods of survival analysis (26), and the method has gained increasing popularity in recent years (e.g., refs. 716). The principle of period analysis has been described in detail elsewhere (1, 17). Briefly, it consists of restricting the analysis to the survival experience of cancer patients in some recent time period, which is achieved by left truncation of observations at the beginning of that time period (in addition to right censoring of observations at its end). With conventional application of period analysis, there is a tradeoff between uptodateness and precision of survival estimates: increasing uptodateness by restricting the analysis to a relatively short recent time period, such as the most recent calendar year for which cancer registry data are available, goes along with a loss of precision. We recently proposed a modelbased approach, in which much more precise period estimates of survival for the most recent single calendar year can be obtained through modeling of survival trends within a time window encompassing the most recent 5 years (18). The aim of this article is to assess the possibilities to extend the time window used for modeling to come up with even more precise but equally accurate and uptodate estimates of prognosis of most recently diagnosed cancer patients to maximize the benefits of modelbased period analysis of cancer patient survival.
Patients and Methods
Database
Our analysis is based on data from the nationwide Finnish Cancer Registry, which covers a population of ∼5 million people and which is well known for its high levels of completeness and data quality (19). At the time of this analysis, the database encompassed patients diagnosed within more than half a century from 1953 to 2004, with a followup with respect to vital status until the end of 2004. In this analysis, we included patients ages ≥15 years with a first diagnosis with one of 20 common forms of cancer between 1953 and 1999.
Statistical Analysis
Throughout this article, we present relative rather than absolute survival rates because the former are the measures of prognosis most commonly reported by populationbased cancer registries. Relative survival rates reflect the probability of surviving the cancer of interest rather than the total survival probability (20, 21), taking expected deaths in the absence of cancer into account. For this analysis, the expected numbers of deaths were derived from age, gender, and calendar period–specific mortality figures of the general population of Finland according to the socalled Ederer II method (22).
In a first step, actual 5year relative survival of patients diagnosed in 19951999 and followed through 2004 was compared with the most uptodate estimates of 5year relative survival that would potentially have been available by the end of 1997 (the median year of diagnosis of this cohort) by the following methods of survival analysis (see Fig. 1 ): first, a “conventional” period analysis for the year 1997 only; second, a “modeled” period analysis, by which a period estimate of 5year relative survival for 1997 is estimated by trend analysis from a database including periods of 5 years (19931997), 10 years (19881997), and 15 years (19831997).
The modeling approach, which has previously been described in detail for application with a 5year time window (18), is outlined in Appendix 1. Briefly, survival probabilities are modeled for each combination of calendar year and year of followup within the specified time window. For that purpose, numbers of patients at risk and of deaths by year of followup are first calculated for each single calendar year within the specified time window. Next, a Poisson regression model for relative survival is used, in which the numbers of deaths for each combination of calendar year and year of followup are modeled as a function of calendar year (included as a numerical predictor variable) and year of followup (included as a categorical predictor variable). The logarithm of the personyears at risk is used as offset, and late entries and withdrawals are accounted for as halfpersons. The model assumes a linear trend for the conditional survival estimates within the time periods used for modeling. This trend is assumed to be the same for each year of followup, but, like in conventional, nonparametric life table analysis, no specific shape of the survival curve is assumed. Conditional survival probabilities for each year of followup, 5year cumulative period survival estimates, and their SEs are derived from the model results, as previously described (18).
The modelbased approach provides a general framework that encompasses conventional cohort or period analyses as special cases of applications of saturated models. For example, a conventional period estimate of 5year survival for 1997 can be obtained from a saturated model, in which the period of interest includes just one calendar year (i.e., 1997). This way, only five observations are included in the regression model, from which five parameters are estimated (one for each year of followup, none for calendar year). To ensure perfect comparability of results for conventional and modeled period analysis, we derived the former as special case from a saturated model by the same computer programs used for the modeling approach in our analysis.
Next, the analyses described in the previous paragraphs were repeated for each single calendar year from 1972 to 1997, the widest possible range of years for which pertinent calculations could be carried out with the data available. In addition, we carried out pertinent analyses varying the time windows by 1year steps between 1 and 15 years. This allowed to address the performance of the various methods in a much broader range of settings. We calculated the following summary indicators of the performance of the various methods: the mean difference and the mean square difference between the various estimates of 5year relative survival potentially available in the respective year and 5year relative survival later observed for patients diagnosed in the 5year calendar intervals around that year (5year intervals were chosen to limit the role of random variation). The mean differences reflect the average underestimation or overestimation of the 5year relative survival. In addition, the mean square differences reflect, among other factors, the random variation in the various estimates.
Extension of the time window included in modeling from 5 to 10 years or from 10 to 15 years requires additional inclusion of patients diagnosed a longer time ago. For example, with the database shown in Fig. 1, patients diagnosed from 1983 on or from 1978 on would have to be included in modeling using 10 and 15year time windows, whereas modeling using a 5year time window could be achieved with a database including patients diagnosed from 1988 on only. Extension of time windows might therefore be difficult for “younger” cancer registries with less longstanding time series of registration. We therefore additionally evaluated “abbreviatedperiod” modeling using 10 and 15year time windows, but relying on the same cohorts of patients needed for “fullperiod” modeling using 5 and 10year time windows, respectively, as illustrated in Fig. 2 . In addition, abbreviatedperiod modeling using a 5year window was also used, which requires a minimum number of 5 oneyear cohorts for analysis. Compared with fullperiod modeling, some of the survival experience in the later years of followup (which would have to come from the “older cohorts”) is left out, whereas the database for the early years of followup, where the vast majority of cancer deaths occurs, is essentially the same.
The analyses were carried out using the SAS statistical software package. For all survival analyses, the macro period was used to derive the numbers of patients at risk and of deaths by year of followup year and by calendar year (17, 23). Some minor formal modification of the output was made to facilitate the subsequent steps. Next, the procedure GENMOD was used to carry out Poisson regression, and the output of the regression models was used to carry out the subsequent calculations, as previously described (18).
Results
Overall, 639,011 patients ages ≥15 years were reported to the Finnish Cancer Registry with a first diagnosis of cancer between 1953 and 1999. Of these, we excluded 2.6% notified by death certificate only, another 2.5% notified by autopsy only, and 0.1% due to missing information on month of diagnosis. The 20 forms of cancer specifically addressed in this article include 87.1% of the remaining cancer cases.
The numbers of notifications of patients with these 20 forms of cancer in 19951999, as well as the actual 5year relative survival later observed for these patients, are shown in Table 1 . In addition, Table 1 shows the estimates of 5year relative survival potentially available in 1997, the median year of the 19951999 interval, by the different analytic approaches (ignoring delay in cancer registration). The point estimates obtained by the conventional period analysis and the different variants of modeled period analysis were, in general, quite similar. In particular, the modelbased estimates were, on average, about as uptodate as conventional period estimates, regardless of the length of the time window included in the modeling. The modeled period estimates were even closer (or as close) to the later observed survival estimates in most cases. This was true for 13, 15, and 13 of 20 cancers with fullperiod modeling and for 11, 15, and 13 of 20 cancers with abbreviatedperiod modeling, using 5, 10, and 15year time windows, respectively (Table 1, bold figures). The SEs were much lower for the modeled period estimates and they decreased with increasing length of the time window used for modeling. With fullperiod modeling using 5, 10, and 15year windows, SEs were, on average, about one third, almost one half, and more than one half lower than the SEs of estimates obtained from conventional period analysis for 1997. SEs were substantially higher for abbreviatedperiod modeling than for fullperiod modeling in case of 5year time windows, but differences were almost negligible for 10 or 15year time windows.
With few exceptions, all types of analyses provided, on average, somewhat too pessimistic estimates of 5year relative survival later observed for patients diagnosed in the 5year interval around each single year between 1972 and 1997, which can be seen from the negative values of most of the mean differences shown in Table 2 . With respect to this criterion, results were generally quite similar for the various analytic approaches. Nevertheless, the modeling approaches did better than or as well as conventional period analysis (bold figures) in a slight majority of cancers using 10year time windows, whereas fullperiod modeling did worse for 14 cancers using 15year time windows and abbreviatedperiod modeling did worse for 15 cancers using 5year time windows. Mean square differences obtained with the modeling strategies were lower than those obtained with conventional period analysis for most cancers (bold figures). According to this criterion, fullperiod modeling did better than conventional period analysis for 20, 20, and 17 cancers, respectively, if 5, 10, and 15year time windows were used. Abbreviatedperiod modeling did better than conventional period analysis for 14, 18, and 19 cancers, respectively.
A more comprehensive evaluation of the performance of fullperiod and abbreviatedperiod modeling according to the length of the time window used for modeling is shown in Fig. 3A and B , again using the mean square difference from later observed survival rates as criterion. With fullperiod modeling (Fig. 3A and B, left columns), minimum mean square difference levels are reached for 12 of 20 cancers using time windows between 3 and 10 years, whereas for the remaining 8 cancers, lowest levels of mean square difference are reached for longer time windows. On the other hand, a steep increase of mean square difference for time windows longer than 10 years is observed for a few cancers. Nevertheless, almost all modeling approaches perform better than conventional period analysis, which is represented by the results for 1year time windows in Fig. 3A and B (left columns). In general, mean square differences were similar or even slightly lower with abbreviatedperiod than with fullperiod modeling (Fig. 3A and B, right columns), except for short time windows around 5 years (the minimum time windows for abbreviatedperiod modeling).
Discussion
In this article, we show that the benefits of the modeling approach, which we recently introduced to come up with both uptodate and precise period estimates of cancer patient survival (18), can be further enhanced by extending the time window used for modeling. Although extension to 15 years or more may often be beneficial, extension to about 10 years seems to be a more prudent choice because risk of misprediction of survival seems to increase rapidly with increasing length of time windows beyond 10 years in some cases. By extension of time windows from 5 to 10 years, period estimates of survival for the most recent calendar year can be derived with SEs approximately half of those that would be obtained with conventional period analysis. This gain in precision can be obtained at no cost in uptodateness or accuracy of period survival estimates in most cases. To enhance precision to a comparable degree in conventional period analysis, the period included in the analysis would have be extended from the most recent single year to the most recent 4 years. This option, however, would go along with a substantial loss of uptodateness (by, on average, 1.5 years), which can be avoided by use of the modelbased period analysis.
The gain in precision by moving from 5 to 10year time windows for modeling can be achieved without additional requirements about the registry database. Although fullperiod modeling using 10year windows would require existence of a reliable registry database for at least 15 years (rather than at least 10 years as in fullperiod modeling using a 5year time window), no such extension is needed for abbreviatedperiod modeling using a 10year time window, which seems to perform equally well or even slightly better in terms of accuracy of predictions than fullperiod modeling in most cases and which can be applied at virtually no cost of precision. The possibility to switch to abbreviatedperiod modeling is of particular relevance for younger cancer registries, such as those having been set up in many European countries as well as in multiple locations in the United States in the past 10 to 20 years (24, 25), because it will enable them to carry out modeled period analysis years before having reached the long time series needed for fullperiod modeling.
The type of database used for abbreviatedperiod modeling in our analysis can also be used and has recently been proposed to be used for “cohort modeling” of populationbased cancer survival data, in which numbers of deaths are modeled by 1year cohorts (years of diagnosis) rather than by 1year periods (years of followup; ref. 26). Despite the common database, both approaches are not identical and yield different results. The relative performance of both approaches needs to be evaluated in further research.
As previously illustrated (18), the modeling approach may also be useful to disclose and estimate recent trends in cancer survival. Our analyses were carried out for all ages combined, and they therefore pertain to crude estimates of relative survival. Given that levels and trends of relative survival may differ by age, and given that the age distribution of patients has shifted and continues to shift to older ages for many forms of cancer, it may often be useful to carry out agespecific analyses or to adjust for age, particularly if the time window included in the modeling is extended to 10 or 15 years. Such agespecific or ageadjusted analyses could be easily implemented in the modeling framework.
The different performance of models based on time windows of various lengths shown for various cancers in Fig. 3A and B raises the question on whether choice of the length of time windows should be based on preassessment of time trends in cancer survival. In theory, with steady improvement in survival at a constant pace (or no change in survival at all), performance of the survival predictions should be independent of length of time windows, in which case longerterm windows might generally be preferred as they lead to more precise survival estimates. Use of longerterm time windows may be particularly risky, however, in case of changing pace of improvement (or even changing direction of survival trends) over time.
For example, the poor performance of models based on longterm time windows for cervical cancer observed in our analysis may reflect the inconsistent trends in survival observed for this form of cancer in Finland in the past decades, which are mostly explained by selection effects resulting from the (very successful) screening program. For other cancers, the reasons for worse performance of longer time periods were less obvious from longterm survival trends. In general, it seems to be difficult, if not impossible, to determine the optimal length of time windows for each single cancer from assessing longterm time trends in survival. In particular, even a longterm apparently consistent pace of change in survival in preceding years does not guarantee good performance of the use of longer time windows and may give misleading results in case of a sudden change affecting patients diagnosed in the period of interest. Furthermore, the choice of different time windows would impede comparisons of cancer survival estimates both within and between cancer registries. We therefore believe that choice of a common time window that proves useful in a broad range of scenarios (such as those evaluated in our analysis) may be a preferred strategy, and our analyses suggest that a time window of around 10 years may be a reasonable choice.
The analyses shown in this article refer to derivation of most uptodate period survival estimates of currently diagnosed cancer patients on the basis of observed survival data. As recently shown, period modeling may also be useful for predicting survival of future cancer patients by extrapolation from observed survival data (27). Optimal length of time windows for the latter purpose may not necessarily be the same as the optimal length for the purpose addressed in this article and requires further empirical evaluation.
All of our analyses are based on data from the Finnish Cancer Registry. Whereas the methods proposed and evaluated are applicable in a large number of cancer registries around the world, the type of thorough empirical evaluation provided in this article, especially in Table 2, can only be carried out with a very long history of highquality cancer registration. The Finnish Cancer Registry is one of very few populationbased cancer registries in the world meeting this criterion, and there is no reason to believe that results would be substantially different with other longstanding cancer registries. In particular, differences in levels and trends of survival between patients with cancer at various sites within a population are much larger than differences between survival of patients with the same type of cancer in various populations (at least within developed countries). The consistency of patterns found for 20 cancers with such strongly divergent levels and trends of prognosis suggests that the methods presented are likely to be useful for a very broad range of settings. This suggestion is further supported by previous replications of the empirical evaluation of period analysis techniques first evaluated using data from the Finnish Cancer Registry, which have generally yielded very similar results (36).
In conclusion, our empirical evaluation suggests that the benefits of the modeling approach for the provision of uptodate and precise period estimates of cancer patient survival for the most recent year for which cancer registry data are available may be further be enhanced by extending the time window used for modeling. In most cases, time windows around 10 years and application of abbreviatedperiod modeling seem to be a reasonable choice. With this approach, SEs of the most uptodate period estimates can be approximately halved compared with conventional period analysis. Compared with period modeling based on 5year time windows, abbreviatedperiod modeling allows this improvement to be achieved without the need to include additional cohorts of patients diagnosed a longer time ago. Extension of the time window included in the modeling beyond 10 years may be problematic because potential risks may outweigh further benefits in some cases.
Appendix 1
Let l_{ij} be the effective numbers of persons at risk (accounting for late entries and withdrawals as half persons); d_{ij}, the observed numbers of deaths; and e_{ij}, the expected numbers of deaths (from population life tables) for each combination of followupyear i(1 ≤ i ≤ 5) and calendar year j.
Then, a generalized linear model d_{ij} = f(i,j) is fitted with outcome d_{ij}, Poisson error structure, predictor variables i (categorical) and j (linear), link ln(μ_{ij} − d_{ij}*), and offset ln(l_{ij} − d_{ij} / 2), where μ_{ij} is the modelbased numbers of deaths and d* = −(l_{ij} − d_{ij} / 2) × ln[(l_{ij} − e_{ij}) / l_{ij}].
Let α_{i} and β be the estimated regression coefficients for followup years i (1 ≤ i ≤ 5) and for a 1year increase in calendar year, and let p_{1} be the first calendar year within the period included in the modeling. Then, estimates of conditional relative survival for each combination of followup year i and calendar year j are given asand an estimate of cumulative 5year relative survival for each calendar year j is given as
Variance estimates of 5year cumulative relative survival for calendar year j can be obtained by the delta method, as previously described (18).
Footnotes

Grant support: The German Cancer Foundation (Deutsche Krebshilfe), Project No. 703166Br 5 (H. Brenner), and the Academy of Finland and the Cancer Society of Finland (Timo Hakulinen).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
 Accepted May 24, 2007.
 Received December 20, 2006.
 Revision received April 25, 2007.