
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
National Institute for Occupational Safety and Health, Cincinnati, Ohio 45226 [K. S.]; School of Math and Statistics, University of Plymouth, Drake Circus, Plymouth PL48AA, United Kingdom [I. B.]; Department of Epidemiology, University of California Los Angeles School of Public Health, Los Angeles, California 90095-1772 [S. G.]; and IARC, 69372 Lyon, France [P. B.]
| Abstract |
|---|
|
|
|---|
1.0) when testing the null hypothesis of no cancer/occupation
association; some of these were probably due to confounding by
nonoccupational risk factors (e.g., smoking). After EB
adjustments, there were 95 (15%) SIRs with P <
0.05 (10% positive and 5% negative). For women, there were 373
SIRs, of which 37 (10%) had P < 0.05 before
adjustment (6% positive and 4% negative) and 13 (3%) had
P < 0.05 after adjustment (2% positive and 1%
negative). Several known associations were confirmed after EB
adjustment (e.g., pleural cancer among plumbers,
original SIR 3.2 (95% confidence interval, 2.54.1), adjusted SIR 2.0
(95% confidence interval, 1.62.4). EB can produce more
accurate estimates of relative risk by shrinking imprecise outliers
toward the mean, which may reduce the number of false positives
otherwise flagged for further investigation. For example, liver cancer
among chimney sweepers was reduced from an original SIR of 2.2 (range,
1.14.4) to an adjusted SIR of 1.1 (range, 0.91.4). A
potentially important future application for EB is studies of
gene-environment-disease interactions, in which hundreds of
polymorphisms may be evaluated with dozens of environmental risk
factors in large cohort studies, producing thousands of associations. | Introduction |
|---|
|
|
|---|
However, Greenland and Robins (2) and Greenland and Poole (3) have argued that in some circumstances, EB3 or semi-Bayes adjustments can be useful as an alternative to traditional multiple comparison adjustments. EB or semi-Bayes adjustments are useful under circumstances when (a) a large number of comparisons are made; (b) the comparisons can be grouped into sets within which all comparisons can be considered similar or "exchangeable;" (c) random error is present and presumably accounts for much of the observed variation in the parameters estimated to evaluate the comparisons (e.g., relative risks, rate ratios, and regression coefficients); and (d) investigators must choose which comparisons to investigate further, and there is a significant cost to such additional investigations. Even if all these conditions are not fully met, EB or semi-Bayes adjustments may be useful. The methods used in this study also require that the estimated parameters have an approximately normal distribution, although other distributional assumptions are possible. For the sake of simplicity, we will refer hereafter to these parameters generically as relative risks, which are generally used after a log transformation to approximate normality.
A typical current example is a large occupational surveillance data set, where many relative risks are evaluated with few or no a priori beliefs about which ones might be real (i.e., causal). Another example is the analysis of many gene-environment interactions, in which relative risks for several environmental exposures are analyzed in conjunction with multiple genetic polymorphisms, some of which might confer susceptibility (4) . Data on gene-environment interactions to date have involved relatively small data sets and expensive laboratory techniques to identify polymorphisms. Technology is evolving that will provide the investigator with hundreds of polymorphisms simultaneously at low cost, with little information about which ones might be of a priori interest. Furthermore, this information will soon be available for large cohorts in which blood samples have been collected at baseline. Such studies will generate thousands or even tens of thousands of relative risks for different environmental agents combined with different genotypes. Another important application in which such methods can have dramatic impact is in adjustment for multiple confounders. Semi-Bayes adjustments of confounder coefficients can allow more thorough and realistic control for confounder effects, especially in settings in which the confounders are highly correlated with the study exposure (5) .
The basic idea of EB adjustments for multiple associations, such as relative risks, is that the observed spread or variation of the estimated relative risks around their geometric mean is larger than the variation of the true but unknown relative risks. EB adjustments attempt to estimate this extra variation from the data at hand and then use this estimate to adjust the observed relative risks. Typically, this adjustment serves to pull or shrink outlying relative risks in toward their geometric mean, more so if the estimate to be adjusted has a large individual variance. This shrinkage attempts to anticipate "regression to the mean," in which outlier observations tend to shrink toward (i.e., become closer to) the mean upon obtaining new data. A consequence of this shrinkage is that the overall variance of the EB-adjusted estimates is smaller than that of the unadjusted estimates. The variance of each estimated relative risk is also reestimated. Furthermore, although the individual EB-adjusted estimates are not statistically unbiased, the average squared error of the adjusted estimates will be less than the average squared error of the original estimates. EB estimators are part of a class of "shrinkage" estimators with a long history in the statistical literature (6, 7, 8, 9, 10) . This class includes estimators based on hierarchical, multilevel, and mixed (random) coefficient models; see the article by Greenland (11) for a nontechnical review of this class.
In semi-Bayes adjustments, the investigator specifies or chooses an a priori value for the extra variation. If the investigator cannot specify what this extra variation might be, then EB adjustments may be used. EB adjustments may be inherently more satisfying to epidemiologists than semi-Bayes adjustments because all parameters are estimated from the data without any a priori specification by the investigator. However, when accurate prior information regarding the parameters is available (e.g., a reasonable range in which they are likely to fall), semi-Bayes estimates can outperform EB estimates (12) .
Forecasting examples and simulation studies (in which the true values of all parameters are known, and observed values with random error are generated) have been used to show that these types of adjustments can provide more accurate point estimates and narrower confidence intervals than the original estimates (9 , 10 , 12, 13, 14, 15) , as predicted theoretically (6, 7, 8 , 10) . Such accuracy improvements have also appeared in real examples (4 , 12 , 16 , 17) .
Here we provide an example of the use of EB adjustments in a large surveillance data set of occupation and cancer in the Nordic countries. The statistics are not very complicated in our example. We also provide an S-plus program for the interested reader in the Appendix . A more general regression program in SAS, PROC GLIMMIX, can also be easily adapted to carry out these analyses (18) .
| Materials and Methods |
|---|
|
|
|---|
![]() | (1) |
where Varobs is the observed sample variance of the log relative risk estimates, and Varmean is the mean of the estimated variances of each estimate (e.g., each observed log relative risk has an estimated variance, and Varmean is the mean of these estimated variances). In practice, Varobs and Varmean are weighted statistics. The weights themselves depend on the estimated Vartrue and therefore must be derived iteratively (see below).
Two comments can be made about the formula above. It follows from the above equation that the observed variance of the estimates can be decomposed into the variance of the true log relative risks and the average variance of the individual estimates, i.e., into the variance of the true values plus a component due to random error. Furthermore, because Vartrue must be positive, Varobs must be greater than Varmean for the procedure to work. When the estimated variances fail to satisfy this inequality, the approximate EB methods used here must be abandoned in favor of semi-Bayes methods, in which the Vartrue is specified by the investigator. For example, Greenland and Poole (3) in one example specify that the Vartrue is 0.25 (SDtrue = 0.5), which (assuming normality) implies that 95% of the true relative risks are within a 7-fold range of each other (based on the width of the 95% prior probability interval, e.g., exp (2(1.96)0.5 = 7.1). Caution must be exercised in choosing such a range, of course; the range should reflect what has been found in other studies of the same type.
There are two important restrictions to the types of data for which one can use these kinds of Bayesian adjustments. First, as mentioned above, the original observations to be adjusted must be able to be considered as arising from an "ensemble" or population of true relative risks that have an approximately log normal distribution around some single unknown geometric mean relative risk. For example, if the mean of the log relative risks is 0, some true log relative risks may be greater than 0 because exposure causes disease, and some may be less than 0 because exposure protects against disease, whereas a large number will be clustered around 0 because exposure has little or no effect on disease. However, if some exposures are known to cause disease whereas others are not, it would not be appropriate to treat them as coming from a single distribution. This means that some exposure-disease relative risks should not be grouped with others. For example, one should not include relative risks for diet items and lung cancer with relative risks for smoking and lung cancer in a Bayesian adjustment because the two sets of log relative risks would not be expected to have the same mean. This example is rather extreme; lung cancer relative risks are very high for smoking-related variables. In many practical situations, there will be few established exposure-disease associations, and even for them, relative risks may be expected to be only modestly elevated, so that investigators will not need to create separate ensembles within their data.
A second restriction is that quantitative exposures must be scaled so that the log relative risks (or regression coefficients in linear regression) from an increment of one unit of exposure are comparable, i.e., can be expected to have the same mean. This restriction, for example, would be important in a study of neurological function in which a battery of different tests had been done, e.g., 10 nerve conduction tests, 10 tests of postural sway, and 10 tests of tremor. All these subgroups of tests would be measured on a different scale (meters per second, centimeters of sway, and frequency of tremor). Assume that 100 regressions are done in which exposed are compared to nonexposed while adjusting for covariates like age and sex. The 100 resulting regression coefficients for exposure will estimate the change in neurological function for the exposed versus nonexposed, but the different neurological functions will have been measured on different scales, and the coefficients (and their variances) therefore will not be comparable. These coefficients cannot all be assumed to come from some common distribution, as required by the EB method used here. Nevertheless, they can be transformed to meet this requirement or analyzed with more general EB regression methods that relax this requirement (12 , 13) .
Finally, if the relative risks of interest have prior associations (in that new information about one would change our expectation for another), these associations should be taken into account. The simplest way to do so is to group or regress the relative risks on factors that explain the associations (12) . Similarly, if the relative risk estimates are statistically associated, these associations can and should be taken into account, as in matrix-weighted and penalized likelihood methods (12 , 15) . We consider here only the simpler case in which the associations among the relative risks or among their estimates are absent or negligible.
Statistical Methods.
Again following Greenland and Poole (3)
, consider a group
of exchangeable SIR estimates, an "ensemble" of SIRs, each with an
estimated variance, and let i denote individual
members of this ensemble. Taking logarithms, we derive a weighted
average, as shown below.
![]() | (2) |
![]() | (3) |
where Si2 is the variance
of each lnSIRi, and
true is the estimated
Vartrue. Note that because
true is the result of these calculations
in an EB analysis, we cannot know it at the beginning
(Vartrue is specified at the start of a
semi-Bayes analysis). Thus EB analyses require the use of iteration,
where an initial guess of Vartrue is used, and
then this initial guess is refined iteratively. Now let
![]() | (4) |
![]() | (5) |
so
obs is our estimate of
Varobs. Then derive the estimate for
Varmean as described below.
![]() | (6) |
We can now estimate Vartrue by
true =
obs -
mean or by max
(
obs -
mean,
2), where
2 is a
user-specified minimum plausible value for
Vartrue. Finally, we can derive our EB estimate
of each true SIR as a weighted average of the original estimate and the
mean of the estimates as described below.
![]() | (7) |
Note that if
true is large, this
gives more weight to the original estimate. On the other hand, if the
variance of the individual estimate
Si2 is large, this gives more
weight to the overall mean of the estimates.
Details of the iterative methods necessary to do the above calculations are provided in the appendix to Greenland and Poole (3) , supplemented by an improved formula for the variance of the adjusted estimates found in Greenland (15) ; these two articles provide the basis for the S-plus program in the Appendix . Note that this program is quite general and can account for associations among the relative risks by grouping of some subensembles within the general ensemble of interest (but assumes the same estimated Vartrue across subensembles). It also allows for correlations among the relative risk estimates. When there is no subgrouping of ensembles (subensembles), and there is no correlation among the estimated relative risks, the matrices in the S-plus program reduce to vectors, and the computations are considerably simplified.
Data Used for Example.
Data for our example were derived from a record-linkage study of cancer
by occupational group in the Nordic countries (19)
. In
this study, occupation was recorded at the time of the 1970 or 1971
census for the population aged 2564 years of Sweden, Denmark, Norway,
and Finland (approximately 10.1 million people). Follow-up was
conducted through 19871991 for cancer incidence, with the exact date
varying by country. The four countries had nationwide cancer
registration during this period. Indirectly standardized SIRs for each
sex, using the whole population as the referent, were calculated for 35
cancers and 53 occupational groups after stratification of the data by
age, calendar time, and country. Exact confidence intervals were
calculated for SIRs with less than 100 observed cases by assuming a
Poisson distribution for the observed cases, whereas an approximation
(based on a normal approximation to the Poisson distribution) was used
for SIRs with more than 100 cases.
Following the suggestion in Greenland and Poole (3) to exclude estimates based on very small numbers, we eliminated all SIRs for which there were five or fewer observed cases. We furthermore eliminated all SIRs for the "unknown" cancer site and also eliminated four occupational categories, i.e., those not economically active, those active but not otherwise classifiable, those in the armed forces, and those in the public safety/protection group (a heterogenous group). This left 2357 cancer/occupation combinations.
The published results across all occupational groups exhibit
considerable confounding of occupation-specific results by
nonoccupational variables related to socioeconomic status. For example,
women in higher socioeconomic classes (in profession/managerial
occupations) had more breast cancer, likely reflecting different
patterns in reproductive behavior (Fig. 1)
rather than different exposures to occupational breast carcinogens.
Similarly, men and women in higher socioeconomic classes had less
smoking-related cancers such as lung, larynx, and bladder cancer. These
patterns are well known. We did two things to the data to limit
confounding by nonoccupational variables related to social class and to
create an ensemble of exchangeable SIRs for EB adjustments.
|
Second, for men and women separately, we made an adjustment for social class by scaling the observed 23 occupation-specific SIRs for each cancer in the manual laborer/craftsmen group so that each overall (across all occupations) sex-specific, cancer-specific SIR would be 1.0. We did this because the reported cancer-specific SIRs in the manual laborer/craftsmen group were calculated in reference to the total Nordic population (across all socioeconomic groups) and hence would reflect the influence of nonoccupational variables (e.g., smoking) that differ across socioeconomic groups. For example, if the overall lung cancer SIR for male manual laborers/craftsmen was 1.5, all 23 occupation-specific lung cancer SIRs in this group were multiplied by 1/1.5, so that the overall lung cancer SIR after this adjustment in this group became 1.0. In effect, this adjustment was equivalent to adjusting the expected number of group-specific cancers without affecting the observed number. The variance was not correspondingly adjusted. The adjustment made the overall SIR (across all cancers and all occupations and both sexes) for manual laborers/craftsmen equal to 1.0, creating an ensemble of interchangeable SIRs for our EB adjustments.
| Results |
|---|
|
|
|---|
|
|
|
obs)
was 0.032. The weighted average of the individual variances estimated
by EB (
mean) was 0.012. The EB estimated
variance of the true lnSIRs (
true),
approximately the difference between
obs
and
mean, was thus 0.019. The large size
of
obs compared with
true resulted in considerable shrinkage of
the new EB-estimated lnSIRs toward the mean of 1.02 (Fig. 4, a and b)
true need to be evaluated against
background knowledge by translating them to an estimated 95% prior
interval for the relative risks. For men, this interval is
exp(0.020 ± 1.96(0.019)1/2), or 0.781.28.
|
obs was 0.031, and
mean was 0.022, resulting in a
true of 0.009. This again resulted in
considerable shrinkage of the lnSIRs toward their mean (Fig. 5, a and b)
|
1.0) associations with P < 0.05 (1%),
compared with 9 such positive and 5 such negative associations using
EB.
Table 2
gives some results for relative risks for males that were suspected
a priori (known from the literature) to be elevated;
although reduced somewhat due to the EB adjustment, they remained
unambiguously positive. Table 3
gives some results for male relative risks that were not previously
suspected; these were reduced considerably by EB. The original findings
for these relative risks are likely to have been false positives.
Finally, Table 4
presents a few relative risks that remained slightly elevated and still
had P < 0.05 after EB adjustment. Lip cancer among
construction workers may be due to solar radiation, whereas the others
are largely unanticipated, although one can hypothesize that blood
neoplasms might be increased among welders due to high EMF
exposure.
|
|
|
| Discussion |
|---|
|
|
|---|
We have not tried to systematically analyze the results for this data set; instead, we selected only a few associations to highlight as examples. Readers are referred to the original publication for a detailed discussion of results (19) . Our main purpose here has been to illustrate the methods rather than try to identify all associations worth further investigation in this population. Furthermore, our adjustment for socioeconomic class was necessarily crude, undoubtedly leaving residual confounding by nonoccupational risk factors.
One important result of using Bayes adjustments in this type of study is the weeding out of false positives. Of course, some true associations may also be weeded out in the process. The usefulness of these kinds of adjustments therefore depends on the cost associated with further investigation of false positives. These costs will be specific to each type of study, and few generalizations can be made. If the result of a false positive in a large number of cancer-gene-environment associations is the launching of a case-control study to try to replicate an original finding, the cost may be high, and the EB adjustment may therefore be worthwhile. For presentation of results, investigators might wish to present both the original unadjusted findings and the Bayesian-adjusted findings. Assuming a decision rule for further investigation is based on an elevated SIR with P < 0.05, based on unadjusted results investigators would further investigate 84 of 642 (13%) SIRs among male laborers/craftsmen. With an EB adjustment, investigators would further investigate 62 SIRs (10%). With a Bonferroni adjustment (P < 0.0000778 or 0.05/642), investigators would pursue only 18 SIRs (3%). Thus, the EB adjustment results in a decision rule that is between no adjustment and a Bonferroni adjustment with regard to the number SIRs selected for further investigation. EB adjustments result in a selection of parameter estimates that are more accurate than those selected by the classical decision rules; therefore, these adjustments should result in a more accurate prediction of future results than either classical rule.
It is worth noting that the percentage of unambiguous associations
remaining in Table 3
after EB adjustment is highest for positive
associations in males, in accordance with the a priori
belief that true positive occupational associations are most likely to
occur among males (true associations because men are more likely than
females to have had higher and longer exposures to occupational
carcinogens, and positive associations because occupational agents are
unlikely to protect against cancer). This may be an indication that the
adjustment has been relatively successful in weeding out false
positives but retaining true positives.
We have used an example from a field, occupation and cancer, in which there is a general body of a priori knowledge (although not necessarily directly applicable to most of the occupation/cancer associations studied in the Nordic population). Thus, in our example, EB adjustments are only a supplement to attempts to weed out false positives based on a priori considerations (e.g., knowledge of likely exposures within occupations). Future cancer-gene-environment studies, on the other hand, are likely to produce a very large number of associations about which there is little or no a priori knowledge whatsoever; EB or semi-Bayes adjustments may prove essential in making sense of such data.
|
|
| Appendix 1 |
|---|
|
|
|---|
The program (Table 5)
requires a vector of original estimates (bhat), a corresponding
variance-covariance matrix (vhat), a matrix to describe prior knowledge
(z), and the number of iterations to be carried out (niter). Let
n be the number of log relative risks to be estimated. For
our simple example, we do not incorporate prior knowledge in the form
of grouping into subensembles or other prior knowledge constraining the
original estimates. Therefore, z is simply a column vector of ls with
length n, and each estimate will be pulled toward the same
mean (lnSIRmean). However, more generally,
suppose we wish to define g groups (subensembles), each with
a different mean but the same Vartrue. Then z
comprises g columns of dummy variables indicating the group
membership of each of the n log relative risks. More
generally still, z may include other ancillary information upon which
the original estimates can be regressed. In the two-stage modeling
example presented by Witte et al. (17)
, z had
87 rows corresponding to the food constituents in 87 dietary items, for
which the original estimates had been derived in the first
stage.
| Acknowledgments |
|---|
| Footnotes |
|---|
1 I. B.s work on this study was done while
working at the IARC under a Special Training Award. ![]()
2 To whom requests for reprints should be
addressed, at National Institute for Occupational Safety and Health,
4676 Columbia Parkway, Cincinnati, OH 45226. ![]()
3 The abbreviations used are: EB, empirical Bayes;
SIR, standardized incidence ratio. ![]()
Received 10/ 7/99; revised 6/ 6/00; accepted 6/19/00.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
S. Greenland Multiple comparisons and association selection in general epidemiology Int. J. Epidemiol., June 1, 2008; 37(3): 430 - 434. [Full Text] [PDF] |
||||
![]() |
A t Mannetje, E Dryson, C Walls, D McLean, F McKenzie, M Maule, S Cheng, C Cunningham, H Kromhout, P Boffetta, et al. High risk occupations for non-Hodgkin's lymphoma in New Zealand: case-control study Occup. Environ. Med., May 1, 2008; 65(5): 354 - 363. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. F. Kirrane, J. A. Hoppin, F. Kamel, D. M. Umbach, W. K. Boyes, A. J. DeRoos, M. Alavanja, and D. P. Sandler Retinal Degeneration and Other Eye Disorders in Wives of Farmer Pesticide Applicators Enrolled in the Agricultural Health Study Am. J. Epidemiol., June 1, 2005; 161(11): 1020 - 1029. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. N. Cook, A. F. Olshan, H. A. Guess, D. A. Savitz, C. Poole, J. Blatt, M. L. Bondy, and B. H. Pollock Maternal Medication Use and Neuroblastoma in Offspring Am. J. Epidemiol., April 15, 2004; 159(8): 721 - 731. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Frudakis, M. Thomas, Z. Gaskin, K. Venkateswarlu, K. S. Chandra, S. Ginjupalli, S. Gunturi, S. Natrajan, V. K. Ponnuswamy, and K. N. Ponnuswamy Sequences Associated With Human Iris Pigmentation Genetics, December 1, 2003; 165(4): 2071 - 2083. [Abstract] [Full Text] [PDF] |
||||
![]() |
A J De Roos, S H Zahm, K P Cantor, D D Weisenburger, F F Holmes, L F Burmeister, and A Blair Integrative assessment of multiple pesticides as risk factors for non-Hodgkin's lymphoma among men Occup. Environ. Med., September 1, 2003; 60(9): e11 - 11. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Cancer Research | Clinical Cancer Research |
| Cancer Epidemiology Biomarkers & Prevention | Molecular Cancer Therapeutics |
| Molecular Cancer Research | Cancer Prevention Research |
| Cancer Prevention Journals Portal | Cancer Reviews Online |
| Annual Meeting Education Book | Meeting Abstracts Online |