Abstract
Objective: This study was designed to establish estimates of the smallest effects due to chemopreventive intervention detectable by karyometry in skin biopsies.
Methods: Estimates of the smallest change of statistical significance and estimates of the power of the test were derived for several key features descriptive of the distribution of nuclear chromatin. Results from triplicate biopsies from the same case were used to provide estimates of the withincase, biopsytobiopsy variance.
Results: Generally, a change in feature value due to chemopreventive intervention can be statistically secured when it amounts to 5% to 10%. In clinical trials where matched baseline and end of study biopsies from the same cases are available, paired comparison ANOVA can detect a 2% change on samples of 25 cases. Establishing efficacy in individual cases requires a change in feature values on the order of 10% to 15%.
Conclusions: Karyometry provides a sensitive, quantitative method for the assessment of efficacy of chemoprevention. The effects of withincase, biopsytobiopsy variance need to be considered only in the evaluation of individual cases and are on the order of 5% in skin biopsies. (Cancer Epidemiol Biomarkers Prev 2008;17(7):1689–95)
 chemoprevention trials
 surrogate endpoint biomarkers
 karyometry
 sampling variance
 power estimates
Introduction
Karyometry provides a quantitative assessment of the spatial and statistical distribution of the nuclear chromatin. This distribution reflects the functional state of a nucleus and reacts in an exquisitively sensitive manner to any change in the differentiation of a cell. The chromatin distribution thus lends itself ideally to the defining of a surrogate endpoint biomarker, with its ability to provide numeric criteria, high sensitivity for the detection of change, timely response, and statistical validation of small differences (1).
In defining a surrogate endpoint biomarker for a chemopreventive intervention, one has two choices. One may define a fixed numerical value for a criterion and use, as a measure of agent efficacy, the proportion of nuclei or cases, which at the end of study have returned to below that value. One may also choose to take advantage of a progression curve and accept as evidence of agent efficacy the reduction of deviation from normal, in a given case, downward along the progression curve whether a fixed criterion value was reached at the end of the study or not (2). This latter measure is possibly more realistic and certainly more sensitive for a detection of efficacy. It has worked well in chemopreventive studies of vitamin A and difluoromethylornithine in the prevention of actinic keratoses (3, 4). The method requires support data on the biopsytobiopsy sampling variance within a case to ensure that a reduction in abnormality is due to chemopreventive intervention and not due to randomness of biopsy sampling. However, consideration of biopsy sampling variance applies only to a determination of efficacy in individual cases. For a trial with several participants, the average reduction along a progression curve would provide a valid measure, unless it was very small and of the order of less than the variance expected due to randomness.
Establishing significance of an agent's efficacy presents a multifaceted problem. There is the evaluation of all baseline and endofstudy (e.g., endpoint) samples. Here, one can take advantage of variance analytic procedures that allow one to separate a chemopreventive agent's effect from casetocase variability. If matched baseline and endpoint samples from the same cases are available, which usually is the rule, one may employ a paired comparison design with its inherent gain in sensitivity of detection (5).
In situations where the deviation from normal is subtle, for instance in individuals merely at high risk for the development of progressive disease or in preneoplastic developments, only a small proportion of nuclei may exhibit signs of progression, and response to a chemopreventive agent thus is restricted to a small subpopulation of nuclei (6). In these situations, an assessment of response may require identification of the progressed subpopulation first. There may also be situations in which a certain proportion of participants show no response; efficacy assessment then has to be done for responders only, but a criterion for “response” needs to be defined. In assessing efficacy in individual cases, the extent of withincase, biopsytobiopsy variance must be considered.
The primary goal of this study was conducted to explore the limits of detection, by karyometric procedures, of chemopreventive efficacy. Multiple biopsy data from the same case are not usually collected in chemoprevention trials. This leads to challenges discriminating variation due to chemoprevention efficacy from randomness of biopsy sampling. In the current study, therefore, a separate effort was made to derive estimates of feature variability as a function of biopsy sampling.
Materials and Methods
The clinical materials analyzed in this study were collected in the course of skin cancer chemoprevention studies within the Arizona Cancer Center Chemoprevention of Skin Cancer Program Project grant NCI P01 CA027502. Sample processing, digital recording, and diagnostic evaluation procedures within this project have been described elsewhere (3, 4, 7).
For estimates of withincase, biopsytobiopsy variance, the material included seven cases from which triplicate skin biopsies were taken, across the forearm, from visually normal appearing, sunexposed skin. Nuclei were recorded from the basal cell and the immediately adjacent suprabasal cell layer to a total of 3,072 nuclei. The full complement of 100 nuclei was recorded for 5 of these cases. Also recorded were 2,877 nuclei from 31 cases of actinic keratosis (AK) and 2,152 nuclei from 21 cases of squamous cell carcinoma (SCC). In addition, additional data from a study of chemopreventive efficacy of vitamin A (3) were used to model a limit of detection situation.
The statistical procedures address two issues, the casetocase variability and the withincase, biopsytobiopsy variance. When matched baseline and endpoint biopsies are available, only the latter was required to be considered. For a determination of the minimum change required to establish efficacy of chemopreventive intervention in an individual case, a bivariate model and the 95% confidence and tolerance regions (8) for two karyometric variables were used. An estimate for the extension of this region to allow for the withincase, biopsytobiopsy variance was provided. In all estimates of limits of detection only a onetailed test of significance was required, because only a change in direction of lesser deviation from normal was of interest.
Results
Reproducibility: Feature Deviations from Overall Mean
Karyometric data processing routinely results in a calculation of ∼100 variables, termed “features,” which are descriptive of the nuclear chromatin pattern. However, a much smaller number actually enter the analysis. Of these, some “key features” allow a tracking of changes in the chromatin pattern. As a first step in the evaluation of limits of detection of change, a survey of the reproducibility of the measurement of such a set of features is of interest. A set of key features is listed in Table 1 . For the 5 cases of skin biopsies for which 100 nuclei each had been recorded, the means for the first, second, and third of the triplicate biopsies were computed. The averaged deviations (in percent) of these three values from the overall mean are presented in Table 1.
The mean pixel absorbance is an important feature, as differences in staining have an effect on many of the chromatin texture measures. For this variable, a mean of 70.9, in arbitrary, relative units, was found in nuclei from skin biopsies. Table 2 lists the case means and their differences to the overall mean (in percent) for the mean pixel absorbance, in the first two columns. The next column lists the average difference of the triplicate biopsies from their case mean. The last three columns list the differences (in percent) of the triplicate biopsies from their case mean. The average difference of a biopsy from its case mean is 6.81%.
Limits of Detection Imposed by Total Variance: Power Estimates
The limit of detection and power of the test are a function of the variance of the karyometric measurements. The total variance has several components: casetocase variance; the withincase, biopsytobiopsy variance; the nucleustonucleus variance; and the digital recording variance component. Of these, the casetocase variance is usually the largest contributor to the total variance. The digital recording variance (variability of repeat measurements of a given feature for the same nucleus) is in most practical situations due to pixel noise and reduced to negligible levels by frame averaging in a videomicrophotometer.
Several features descriptive of the nuclear chromatin pattern follow a monotonic rise from normal, minimally sunexposed skin to AK and SCC (8). These lend themselves for the plotting of a progression curve. Chemopreventive efficacy would be indicated by a decrease in value of these variables in nuclei at the end of study. Table 3 presents values for six of these variables that had been selected for a discriminant analysis of normal skin versus AK. Added to this list are values observed for the 10% of nuclei most progressed in SCC to represent the upper end of the progression curve.
Derivation of a limit of detection is best shown by several examples. In a clinical trial, nuclei from normal skin and from AK were recorded. The sample for the AK biopsies included 31 cases with 2,877 nuclei recorded. One of the features reflecting progression to AK is the number darkstained pixels, indicating chromatin granules with high absorbance. For the nuclei from the AK lesions, the mean (SD) for the darkstained pixels feature (feature 321) was 453.6 (95.8). Assuming that the SD in the end of study biopsies remained very similar to that in the baseline biopsies, it is important to detect the smallest difference that could be established as significant at P < 0.05 with adequate power. For a onetailed test, the 5% cutoff point is at 450.0. Therefore, a decrease in value of feature 321 significant at a level of P < 0.05 by a certain percentage would lead to the power estimates given in Table 4 . In this example, the percent detectable change was expressed in terms of a scale from zero to the feature value recorded for AK.
One might argue that a more conservative estimate should be based on the percentage change along a scale spanning the distance from the mean of normal skin to the mean of nuclei in AK. In this case, the detectable change would involve a larger percentage. For the above example, the nuclei in normal skin had a mean of 300.0 for feature 321. The minimum detectable change would be increased from one to 2.96% on the normal/AK scale.
The changes in the nuclear chromatin pattern concomitant with progression to AK are not necessarily expressed in every nucleus of a biopsy but certainly in the nuclei found at the high value end of a distribution (e.g., discriminant function scores). A notable proportion of nuclei in AK lesions may have nearnormal and even normal feature values. Yet, a response to chemopreventive intervention could only be expected in those nuclei that deviate from normal. Thus, one may elect to measure detection of chemopreventive efficacy in the subset of nuclei consisting of perhaps 10% of the most progressed nuclei in an AK biopsy.
For the above data set, the mean for the 10% most progressed nuclei in the AK lesions, for feature 321, was 807.0. The SD for this subset was 157.3. There were 288 nuclei available. The 5% cutoff point is at 791.8. For an assumed decrease in mean for feature 321 due to chemopreventive intervention by certain percentage points, the following power estimates apply (Table 5 ). The 3% detectable decrease with 83% power would increase to a value of 4.8% on a normal/AK scale.
The average coefficient of variation of the six variables listed in Table 3 is 40.3% with a range from 35.5% to 49%. To provide an indication of the limits of detection in general, a model based on a mean feature value of 1.00, and a coefficient of variation of 40%, is presented for samples sizes ranging from 100 to 3,000 nuclei. This model is based on values that practically represent those seen for the feature total absorbance (feature 001). The power estimates are provided in Table 6 .
The model shows that for sample sizes generally recorded in an earlyphase chemoprevention clinical trial (e.g., 1030 cases) with 100 nuclei recorded per biopsy, a change on the order of 5% to 10% can be detected with adequate power even in the face of a rather notable coefficient of variation of 40% due to total variance. No assumption was made here that matched samples from baseline and endpoint samples were analyzed.
The total variance cited above includes biopsytobiopsy variance within a case, but most of the total variance is due to casetocase variability. This is shown by an ANOVA.
ANOVA: Paired Comparison
To further probe the limit of detection in a clinical trial, a simulation study was set up. As BL data, a set of discriminant function scores from a study of chemopreventive efficacy of vitamin A in the prevention of AK (3) were used. This set included 26 cases. The rationale was that a realworld set of karyometric data would represent the inherent variability more realistically than an exact parametric model. For a simulated endpoint data set, the case mean scores from the baseline data set were systematically reduced by 20%, 10%, and 1% to establish down to what difference in score means a significant result could be obtained.
The ANOVA was based on a paired comparison design. There were two fixed levels A for the baseline and endpoint data, providing a  1 = 1 df. There were 26 random levels B for the different cases, with b  1 = 25 df, which leaves 25 df for the error term. For the paired comparison design, significance testing is done A/B at 1 and 25 df and B/E at 25 and 25 df.
For the sample size of 26 pairs of observations, there was, as expected, a highly significant effect of the intervention when a 20% difference was modeled and likewise for the runs at 10%, 5%, and 3%.
For a 2% difference, a onetailed test resulted in a F ratio of 3.26, whereas a critical value of only 2.92 is needed for a P < 0.05 level of significance. For a 1% difference in the model, no significant effect could be shown. Table 7 presents the ANOVA tables for the 2% and 1% difference models.
The mean square due to chemopreventive intervention in the 2% model amounts to 0.00561 (SD = 0.075). The mean for the set's discriminant function scores had been 1.07211, with a 6.9% coefficient of variation. The 95% confidence limit for a case with 100 recorded nuclei in a onetailed test thus would be X̄  (1.64 × 0.0075) or X̄  0.0123 = 1.0598. An end of study value for a case from this data set would have to have a score of less than this value at least to indicate a significant chemopreventive effect; a correction for withincase, biopsytobiopsy variance would still be required. This model assumes a response of the same magnitude in every participant. This might not be expected in most realworld trials.
The above example shows how important it is to obtain an estimate of the SD attributable to the chemopreventive intervention only. The estimate has a notable effect on the power of the test that can be attained. The total sum of squares in the above discriminant analysis had been 8.3747. With 25 df, this results in a SD for the data set of 0.5788 or a coefficient of variation of 53.7%. The consideration of the mean square attributable to the chemopreventive intervention reduces that to 6.9%.
Power Estimates for a Paired Comparison of Cases with Baseline and Endpoint Biopsies
To probe the limits of detection, a model based on several variables is presented. A feature with a mean of 70.0 is assumed, corresponding to the value observed in the mean pixel absorbance. Several SD are modeled (8.0, 10.0, and 12.0), covering the range of values seen in the feature mean pixel absorbance of SD = 8.7. Sample sizes from 10 to 20 and 40 paired biopsies are included. Differences between the baseline and the endpoint biopsies of 20%, 10%, and 5% were modeled with the assumption that the SD in both the baseline and the endpoint samples remained roughly the same. Table 8 shows the power of test results.
The model shows that for SD commonly found for withincase feature variances a 10% decrease in mean due to chemoprevention can safely be detected on a sample size of 20 paired biopsies and that reasonable power is obtained from a sample size of 10 biopsy pairs.
Efficacy in an Individual Case
In most clinical trials, it will be necessary to evaluate the response in individual cases—be this because not all cases respond in the same manner or that some participants show no response at all. One might relate the displacement of the endpoint sample of a participant, downward on the progression curve, to the 95% confidence limit for the baseline sample of 100 nuclei from the same case. Casetocase variance does not enter and does not need to be considered.
An example may illustrate the procedure. Two karyometric variables often used to plot a progression curve are considered: the relative nuclear area (feature 002) and pixel absorbance nonuniformity (feature 305). Table 9 lists the mean, SD, correlation coefficient, and covariance.
The confidence ellipse for the estimate of the bivariate mean based on a sample of 100 nuclei, for P = 0.05 in a onetailed test, and 2 and 98 df (5) has coordinate values as derived below. The coordinate for the lower apex of the confidence ellipse for the relative nuclear area is equal tofor the pixel absorbance nonuniformity.
Here, X̄_{1} stands for the mean of the relative nuclear area, X̄_{2} stands for the mean pixel absorbance nonuniformity, C_{α} = 188.34 for the confidence region, λ_{2} = 30.03 for the second eigenvalue of the Gaussian bivariate distribution, and b_{1} = 0.681 for the slope of the first eigenvector.
The mean estimated for the relative nuclear area of the endpoint sample of 100 nuclei would have to be <27.55, and <30.23 for the pixel absorbance nonuniformity, to fall outside of the confidence ellipse as shown schematically in Fig. 1A . This means the decrease has to be at least 7.1% for the relative nuclear area and at least 4.5% for the pixel absorbance nonuniformity to confirm chemopreventive efficacy. These coordinates would allow a correct assessment of the chemopreventive effect under the assumption of no withincase, biopsytobiopsy variance.
Comparing baseline and endpoint results from a single case in this manner although results in an underestimate of the effective coefficient of variation: it does not allow for the withincase, biopsytobiopsy variance. Such variance is, however, always present. It may lead to an overestimate of a chemopreventive effect when, in fact, it is statistically insignificant or an underestimate when the observed coordinates in the endpoint biopsy fall into the upper tail end of its feature distributions.
This is best shown by a graphic example. In Fig. 1A, the 95% confidence ellipse for two karyometric variables as observed for the bivariate mean of the nuclei in a baseline biopsy is shown. Also shown is the mean for the ES sample. As shown, the decrease in feature values from the baseline coordinates to the mean measured at the study endpoint would correctly reflect a chemopreventive effect as significant at P < 0.05 in a onetailed test.
However, the measured coordinate values for the endpoint biopsy could have been due to a sampling from endpoint biopsies whose mean fell well into the 95% confidence region of the baseline biopsy, but biopsytobiopsy variance happened to place this particular endpoint biopsy at the low value end of its tolerance region. This is shown in Fig. 1B. There would be no significant chemopreventive effect, and accepting the measured coordinates would result in an overestimate of chemopreventive efficacy. On the other hand, the measured endpoint coordinates may happen to represent a biopsy (from the same case as the baseline biopsy) that happened to fall into the upper value tail of the distribution of feature values in endpoint biopsies in this case as seen in Fig. 1C. It would lead to an underestimate of chemopreventive effect.
Therefore, to be conservative, one would have to require a displacement downward along the progression curve to at least the 95% confidence ellipses' lower apex (onetailed test), based on the SD and covariance of the nuclei measured in the baseline biopsy, plus an additional displacement of a distance equal to 1.64 times the SD derived from the withincase, biopsytobiopsy variances, and expressed in terms of the karyometric feature values.
WithinCase, BiopsytoBiopsy Variance Estimation
The above safeguard requires an estimate of the magnitude of withincase, biopsytobiopsy variance. Multiple biopsies from the same patient are not usually taken in a clinical trial. However, a firstorder estimate of the magnitude of this variance component is offered by the triplicate biopsies recorded in this study. These triplicate biopsies were taken from normal skin, which may for some features have resulted in lower SD than one might expect in an AK lesion. Table 10 presents values for the SD measured in normal skin and biopsies from AK lesions for the karyometric features relative nuclear area and pixel absorbance nonuniformity.
One thus might have to add an allowance to the derivation of the withincase, biopsytobiopsy variance estimate given below. There were three (e.g., triplicate) biopsies for each of the seven cases available. The variable of interest is the withincase, biopsytobiopsy variance. It finds expression in the percent deviation of each biopsy from its case mean. Table 11 shows these data.
The data in Table 11 include the total variance. In the situation under discussion, efficacy in an individual case, it has the two components of withincase, biopsytobiopsy variance and nuclear variance. They can be estimated separately by an ANOVA. The percent differences listed in the last column of Table 11 are, in a firstorder approximation, treated as Gaussian variables. Table 12 presents the ANOVA table. The design is a single classification ANOVA with three groups of equal sample sizes (5). There were 2 df for the triplicate biopsies (the withincase, biopsytobiopsy variance) and 3(71) = 18 df for the remainder variance.
The withincase, biopsytobiopsy variance component amounts to only 3.2% of the total mean square for the relative nuclear area and 14.4% for the pixel absorbance nonuniformity. It is statistically not significantly greater than the nucleustonucleus variance. The average possible displacement due to biopsytobiopsy variance is 4.84% for the relative nuclear area. For the pixel absorbance nonuniformity, it is 6.74%. From the withincase, biopsytobiopsy, mean squares follow SD of 0.68 and 1.91 percentage points, respectively. The tolerance limits for a 95% onetailed test thus would be 4.84 + (1.64 × 0.68) = 5.95 percentage points for the relative nuclear area and 6.74 + (1.64 × 1.91) = 9.87 percentage points for the pixel absorbance nonuniformity. For a conservative claim of statistically significant chemopreventive efficacy in an individual case, the relative nuclear area, in the ES biopsy, would have to be decreased by 7.1% + 5.95% = 13.05%. For the pixel absorbance nonuniformity, a displacement of 4.5% + 9.87% = 14.4% would be required.
The former component is due to the 95% confidence ellipse for the mean of nuclei in the baseline biopsy. It expresses the location of the low apex of that ellipse. The second component is due to the 95% tolerance region of the withincase, biopsytobiopsy difference from the case mean.
Discussion
The data on reproducibility of values measured in multiple biopsies from the same case and in different cases show that the deviations from the overall mean are on the order of a few percent in histologic sections of skin. This variability establishes the “background noise” against which any changes due to a chemopreventive intervention must be statistically secured. In trials where baseline and end of study biopsies from the same cases are compared, only the withincase, biopsytobiopsy variance needs to be considered. This substantially reduces the effective coefficient of variation and allows for adequate power for tests of significance. Only a onetailed test is needed in these analyses because only an improvement due to an intervention is of interest.
In many situations where chemopreventive intervention is undertaken, the deviations from normal or the differences between baseline and endpoint samples are modest. Thus, the numeric data presented here are close to what would be encountered in a clinical trial and appropriate for the objective to estimate a limit of detection.
The limits are smaller for an evaluation of efficacy in several participants than in an individual case. Generally though, a change due to chemopreventive intervention on the order of 5% to 10% can be statistically secured at adequate power for a data set of 10 to 20 cases. For an individual case, a change in the order of 10% to 20% represents a practical limit of detection for a chemopreventive effect. These estimates include the betweenbiopsy sampling variability within the same case. The need to establish the order of magnitude of this latter influence had been the rationale for this study in the first place.
Disclosure of Potential Conflicts of Interest
D. Alberts: Holds a patent on the karyometric analysis, as it pertains to a progression curve of nuclear abnormality for intraepithelial neoplasias; however, it is not specific to actinic keratosis or sun damage and there have been no financial gains related to the patent at this point. The other authors disclosed no potential conflicts of interest.
Acknowledgments
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Footnotes

Grant support: National Cancer Institute grant P01 CA027502.
 Accepted May 3, 2008.
 Received April 11, 2008.