Abstract
In an era of ongoing improvement in cancer patient survival, available longterm survival figures from cancer registries are often outdated and too pessimistic for two reasons: first, delay in availability of cancer registry data, typically in the order of a few years, and, second, application of cohortbased methods of survival analysis, which provide survival estimates for patients diagnosed many years ago. We developed a modelbased period analysis approach aimed to overcome both problems. We provide extensive empirical evaluation of our approach by comparing its performance with that of previously available methods for monitoring of 5 and 10year relative survival, with the use of data from the nationwide Finnish Cancer Registry of 490,279 patients ages ≥15 years and diagnosed with one of 20 common forms of cancer between 1953 and 1997. We show that, in most cases, the modelbased approach predicts 5 and 10year relative survival expectations of newly diagnosed patients quite closely and much better than any of the previously available methods, including standard period analysis. We conclude that the modelbased approach may enable deriving uptodate cancer survival rates even with the common latency in availability of cancer registry data. (Cancer Epidemiol Biomarkers Prev 2006;15(9):1727–32)
 cancer registry
 neoplasms
 prognosis
 statistical methods
 survival
Introduction
Monitoring cancer patient survival is an important task of both clinical and populationbased cancer registries. To be of maximum use for both clinical and public health purposes, estimates of cancer patient survival should be as uptodate as possible. Period analysis has been introduced 10 years ago (1) to provide more uptodate estimates of longterm survival than traditional cohortbased methods of survival analysis. The principle of period analysis consists of restricting the analysis to some recent time period, which is achieved by left truncation of observations at the beginning of that time period (in addition to right censoring of observations at its end). It has been shown that period analysis provides quite accurate and uptodate estimates of longterm survival for patients diagnosed in the period of interest (25). In practice, however, there is often a latency of several years in availability of cancer registry data, and there may be further delay in the process of publication. Therefore, even if period analysis is applied to provide uptodate estimates of longterm cancer patient survival for the most recent period for which cancer registry data are available, these estimates may still be somewhat outdated with respect to currently diagnosed cancer patients. For example, period estimates of longterm cancer survival published in 2002 for the United States and in 2005 for Germany pertained to patients diagnosed with cancer up to the year 1998 and 2002, respectively (6, 7).
In this article, we develop and empirically evaluate a modelbased period analysis approach aimed to provide uptodate estimates of longterm survival even with the common latency in availability of cancer registry data.
Materials and Methods
Database
Our analysis is based on data from the nationwide Finnish Cancer Registry, which is well known for its high levels of completeness and data quality (8). At the time of this analysis, the database encompassed patients diagnosed within half a century from 1953 to 2002, with a followup with respect to vital status until the end of 2003. In this analysis, we included patients ages ≥15 years with a first diagnosis with 1 of 20 common forms of cancer between 1953 and 1997.
Statistical Analysis
Throughout this article, we present relative rather than absolute survival rates, as the former are most commonly reported by cancer registries. Relative survival rates reflect the probability of surviving the cancer of interest rather than the total survival probability (9, 10), taking expected deaths in the absence of cancer into account. For this analysis, the expected numbers of deaths were derived from age, gender, and calendar period–specific mortality figures of the general population of Finland according to the socalled Ederer II method (11).
We first assessed overall trends in survival during the past decades by looking at the development of 5year relative survival for successive cohorts of patients diagnosed between 1953 to 1957 and 1993 to 1997.
Next, 5year relative survival actually observed for patients diagnosed in 1993 to 1997 and followed through 2002 (Fig. 1, bottom solid rectangular frame ) was compared with the most uptodate estimates of 5year survival that would potentially have been available somewhere in the middle of 1993 to 1997, the years of diagnosis of this cohort. Assuming that, with the common latency of cancer registration of about several years, available data might have included patients diagnosed up to the year 1992, the following survival estimates were derived: First, a cohortbased estimate for the cohort of patients diagnosed in 1983 to 1987 and followed over 5 years since then (Fig. 1, top solid rectangular frame). Second, a socalled complete estimate additionally included patients diagnosed in 1988 to 1992, although the latter could not have been under observation for 5 years by the end of 1992 and might be censored at that date unless they died or were lost to followup earlier (Fig. 1, triangular frame). Third, a standard period estimate for the 1988 to 1992 period, derived by left truncation of observations at the beginning of 1988 in addition to censoring of observations at the end of 1992 as previously described (Fig. 1, dashed frame; ref. 1). Fourth, to overcome the latency in cancer registration, a modelbased period estimate for the 1993 to 1997 period was derived.
For derivation of the modelbased period estimates for the 1993 to 1997 period, we first calculated numbers of patients at risk and of deaths by year of followup for each of the three preceding 5year periods, i.e., 1978 to 1982, 1983 to 1987, and 1988 to 1992, just like one would do in standard period analysis for each of these periods. Next, we used a Poisson regression model (12) for the total 1978 to 1992 period. A formal description of the model is given in Appendix 1. Briefly, the numbers of deaths for each combination of 5year calendar period and year of followup were modeled as a function of the calendar period (included as a numerical predictor variable) and year of followup (included as a categorical predictor variable). Based on this model, we calculated the projected numbers of deaths and conditional relative survival probabilities for each year of followup in 1993 to 1997, assuming that the linear trend from the 1978 to 1982 period to the 1988 to 1992 period would prevail, and that the pattern of followup yearspecific survival would remain unchanged otherwise. The modelbased period estimate of 5year relative survival for 1993 to 1997 was then obtained as the product of these conditional survival probabilities.
To address the performance of the various methods in a much broader range of settings, we repeated the analyses of 5year relative survival outlined for the cohort of patients diagnosed in 1993 to 1997 in the preceding paragraphs for cohorts of patients diagnosed in 1988 to 1992, 1983 to 1987, 1978 to 1982, and 1973 to 1977 as well (which is the widest possible range of cohorts for which analogous calculations could be carried out with the available database of the Finnish Cancer Registry). We calculated the following summary indicators of the performance of the various methods: The mean difference and the mean squared difference between 5year relative survival later observed for patients diagnosed in the respective calendar periods and the various estimates potentially available during these periods. The mean differences reflect the average underestimation or overestimation of the 5year relative survival rates. The mean squared differences, in addition, reflect average absolute levels of deviations of single estimates (with particular “punishment” of strong deviations).
To address the performance of the modeling approach for longerterm survival, we carried out analogous analyses comparing 10year relative survival actually observed for patients diagnosed in 1988 to 1992, 1983 to 1987, and 1978 to 1982 with the most uptodate estimates of 10year relative survival that might have been available in those periods.
The analyses were carried out with the SAS statistical software package (Cary, NC), using the macro period, to derive the effective numbers of patients at risk and of deaths (13), and the procedure GENMOD to carry out Poisson regression.
Results
Overall, 577,924 patients ages ≥15 years were reported to the Finnish Cancer Registry with a first diagnosis of cancer between 1953 and 1997. Of these, we excluded 2.6% notified by death certificate only, another 2.6% notified by autopsy only, and 0.1% due to missing information on month of diagnosis. The 20 forms of cancer specifically addressed in this article include ∼89.6% of the remaining cancer cases (n = 490,279).
The numbers of notifications, as well as 5year relative survival rates of patients with these 20 forms of cancer, are shown for calendar years 1953 to 1957 and 1993 to 1997 (Table 1 ). Fiveyear relative survival strongly varied by cancer site. Among patients diagnosed in 1953 to 1957, it ranged from 66.3% for patients with cancer in the oral cavity to 3.2% for patients with esophagus cancer. Fiveyear relative survival rates increased between 1953 to 1957 and 1993 to 1997, albeit to a strongly varying degree, for all but one (pancreas cancer) of the assessed forms of cancer. A most pronounced increase by 49.4%, 45.4%, and 40.3% units (i.e., an average annual increase by >1% unit) was seen for patients with prostate, bladder, and thyroid cancer, respectively. With 89.4%, the latter had the highest 5year relative survival rates in 1993 to 1997 among the cancer patients included in this analysis. On the other hand, 5year relative survival still remained just above 10% for patients with cancers of the esophagus, liver, and lung, and <4% for patients with cancer of the pancreas.
As Fig. 2 shows, increases in 5year relative survival mostly occurred steadily throughout the second half of the 20th century, and for some forms of cancer, such as colon, rectum, or stomach cancer, increases were very close to linear over several successive decades. On the other hand, quite varying trends were seen in different periods for a few other forms of cancer, such as lymphomas and particularly cervical cancer.
Table 2 shows the estimates of 5year relative survival potentially available in 1993 to 1997 with cancer registry data up to the year 1992 by the four analytic approaches compared with the 5year relative survival rates later observed for patients diagnosed in 1993 to 1997. For urological cancers, results are also illustrated graphically (see Fig. 3 ). For those forms of cancer whose prognosis improved over time, the available cohort estimates, which pertained to patients diagnosed in 1982 to 1987, but also (albeit to a slightly lesser degree) the available complete estimates, which pertained to patients diagnosed in 1982 to 1992, were much lower than the 5year relative survival rates later observed in 1993 to 1997. The “standard” period estimates for the 1988 to 1992 period were less outdated in most cases, but for 14 of the 20 forms of cancer, the modelbased period estimate for the 1993 to 1997 period came closest to the 5year relative survival rates later observed in 1993 to 1997 (bold figures in Table 2). The latter was true for three, zero, and four forms of cancer for cohort analysis (which performed worst for 15 of 20 forms of cancer), complete analysis, and standard period analysis, respectively. However, for prostate and cervical cancer, all estimates were much too low. For prostate cancer, underestimation was worst for cohort analysis, and for cervical cancer, it was worst for the modeled period analysis.
With few exceptions, cohort, complete, and standard period analysis provided, on average, somewhat too pessimistic estimates of 5year relative survival rates later observed for patients diagnosed in each 5year calendar period between 1973 to 1977 and 1993 to 1997, which can be seen from the negative values of most of the mean differences shown in Table 3 . This underestimation was generally largest for the conventional cohort analysis (with mean differences ranging up to −11.5% units for melanoma), intermediate for complete analysis, and somewhat lower for standard period analysis. With modeled period analysis, mean differences were generally much closer to 0. They were below 0 for 12 forms of cancer, and above 0 for eight forms of cancer, with a range from −3.5% units to +1.9% units. For 17 and 15 of 20 cancers, absolute values of mean differences and meansquared differences were lowest with the use of modeled period analysis, respectively. Standard period analysis performed best for three and two forms of cancer, and cohort analysis performed worst for 18 and 18 forms of cancer, respectively, according to these two criteria. The only major exception from the good performance of modeled period analysis was cervical cancer, for which the mean squared difference was substantially higher with modeled period analysis than with the other methods of analysis.
The advantages of modeled period analysis are even more striking for 10year relative survival rates, where underestimation of survival by traditional methods of survival analysis may be even much more severe, whereas the performance of the modeled period analysis is, on average, about equally good as for 5year relative survival (see Table 4 ).
Discussion
With ongoing progress in prognosis, conventional cohort and complete estimates of longterm survival are overly pessimistic for many forms of cancer. Although this problem may be partly overcome by period analysis, even the “standard” period estimates are often outdated to some extent once they can be derived due to the usual delay in availability of cancer registry data. In this article, we introduced a modelbased extension of period analysis of cancer patient survival that allows to provide uptodate estimates of longterm survival even in that situation. The modelbased period analysis was found to be superior to any of the other methods, including standard period analysis, for the clear majority of cancers in extensive empirical evaluation.
Our findings regarding the advantage of “standard” period analysis over cohort and complete analysis in terms of uptodateness of survival estimates are in agreement with previous evaluations that had not taken the usual delay in availability of cancer registry data into account (35). Previous analyses had also shown that period analysis may advance detection of trends in 5, 10, 15, and 20year survival by almost 5, 10, 15, and 20 years compared with cohort analysis (2). Modelbased period analysis as applied in this study would be expected to result in further reduction of delay in disclosure of progress in survival by another 5 years. In fact, the mean difference in modelbased period estimates for the current period from 5year and 10year survival later observed for patients diagnosed in that period was close to zero for most forms of cancer in our extensive empirical evaluation, which indicates that the delay in disclosure of progress in prognosis can be overcome almost entirely.
In our modeling approach, we assumed persistent increasing or decreasing trends in relative survival rates over time. Obviously, the modelbased period approach performs best when this assumption holds entirely. In our time trend analyses of cancer survival over half a century, we found entirely or close to monotonic upward trends in survival for most forms of cancer. However, for cervical cancer, divergent trends in various time intervals were found. The transient drop in survival for this form of cancer probably reflects selection processes resulting form the very successful screening program for this form of cancer (14). As a result of this pattern, the performance of modelbased period estimates was worse for this form of cancer than the performance of the other methods of survival analysis. For cancers with little change in prognosis over time, neither standard period analysis nor modelbased period analysis provides advantages over conventional cohort or complete analysis, and all methods essentially perform equally well. Such a pattern was seen, for example, for pancreas cancer in our analysis.
In the application of the modeling approach, there are a number of design options that may be considered. These include, for example, the number and width of periods included in the modeling. In our analysis, we chose to use three 5year periods, covering a 15year time fame altogether. Although a longer time frame (e.g., use of four or five 5year periods) would provide a broader basis for estimation, the assumption of a persistent trend within such a long time frame may often be more problematic. Also, such an approach would only be feasible in cancer registries with a long history of cancer registration at high levels of quality and completeness. Although this prerequisite would be given in the Finnish Cancer Registry, it would hinder application of modeled period analysis in a large and growing number of younger highquality cancer registries. On the other hand, relying on just two 5year periods might provide a too weak basis for reliable estimation of trends and may give too much weight to possible random deviations from longterm trends. The width of time periods used for modeling and prediction is likewise somewhat arbitrary. We feel, however, that the 5year periods used in our analysis may be a reasonable choice, as it limits the influence of shortterm fluctuations in prognosis (whether they are due to chance or due to other reasons), and avoids the limited flexibility that would result from too broad intervals.
A potential tradeoff of the modeling approach, apart from its reliance on the existence of a longterm history of reliable cancer registration, is the increased complexity in calculation and data interpretation. However, the increased complexity of calculations also comes along with increased flexibility, e.g., to take account of possible deviations from linearity in trends, or of other variables that might affect prognosis that might be included in the models. In fact, the modeling approach may be considered as a broader general methodologic framework, which includes standard period analysis as one special case (i.e., the case of a saturated model in which followup year–specific survival rates are estimated for one single calendar period).
In summary, our empirical evaluation supports expectations from theory that the modeling approach further enhances possibilities of deriving uptodate cancer survival rates. This approach may be particularly helpful to overcome outdatedness of survival data resulting from the common latency in availability of cancer registry data.
Appendix 1
Let l_{ij} = the effective numbers of patients at risk (accounting for late entries and withdrawals as half persons), d_{ij} = the observed numbers of deaths, and e_{ij} = the expected numbers of deaths (from population life tables) for each combination of followupyear i and calendar period j.
The calendar periods are coded in such a way that j = 0 for the first calendar period, and j = 1 and 2 for the subsequent calendar periods included in the modeling.
Then, a generalized linear model d_{ij} = f (i,j) is fitted with outcome d_{ij}, Poisson error structure, predictor variables i (categorical) and j (numerical), link ln(μ_{ij} − d_{ij}*), and offset ln(l_{ij} − d_{ij} / 2), where μ_{ij} = the modelbased numbers of deaths and d* = −(l_{ij} − d_{ij} / 2) × ln((l_{ij} − e_{ij}) / l_{ij}).
Let α_{i} and β be the estimated regression coefficients for followup years i and calendar periods j. Then, estimates of conditional relative survival for each combination of followup year i and calendar period j are given asand an estimate of cumulative kyear relative survival for each calendar period j is given as
The modeled kyear cumulative relative survival for the current calendar period is obtained by setting j = 3.
Footnotes

Grant support: German Cancer Foundation, Deutsche Krebshilfe, project no. 703166Br 5 (H. Brenner), and Academy of Finland and the Cancer Society of Finland (T. Hakulinen).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
 Accepted July 5, 2006.
 Received March 20, 2006.
 Revision received June 14, 2006.