## Abstract

Human papillomavirus (HPV) natural history has several characteristics that, at least from a statistical perspective, are not often encountered elsewhere in infectious disease and cancer research. There are, for example, multiple HPV types, and infection by each HPV type may be considered separate events. Although concurrent infections are common, the prevalence, incidence, and duration/persistence of each individual HPV can be separately measured. However, repeated measures involving the same subject tend to be correlated. The probability of detecting any given HPV type, for example, is greater among individuals who are currently positive for at least one other HPV type. Serial testing for HPV over time represents a second form of repeated measures. Statistical inferences that fail to take these correlations into account would be invalid. However, methods that do not use all the data would be inefficient. Marginal and mixed-effects models can address these issues but are not frequently used in HPV research. The current study provides an overview of these methods and then uses HPV data from a cohort of HIV-positive women to illustrate how they may be applied, and compare their results. The findings show the greater efficiency of these models compared with standard logistic regression and Cox models. Because mixed-effects models estimate subject-specific associations, they sometimes gave much higher effect estimates than marginal models, which estimate population-averaged associations. Overall, the results show that marginal and mixed-effects models are efficient for studying HPV natural history, but also highlight the importance of understanding how these models differ. Cancer Epidemiol Biomakers Prev; 19(1); 159–69

- statistical methods
- cervical neoplasia
- human papillomavirus
- HPV
- HIV
- frailty models
- mixed-effects models
- WLW models

## Introduction

Human papillomavirus (HPV), a sexually transmitted virus, is the central etiologic agent in the development of cervical cancer and precancerous cervical lesions (1). There are more than 40 types of HPV, which commonly infect the cervical epithelium as well as other anogenital tissues. At least 13 of these anogenital HPV types are considered oncogenic (2). The incidence of HPV infection is very high, with 3-year cumulative incidence rates reported to be as much as 43% among sexually active college-aged women, and the oncogenic HPV types cause the majority of these infections (3). Most cervical HPV infections, however, including those with oncogenic HPV types, resolve spontaneously within 2 years (4). Only a minority of oncogenic HPV infections persists and leads to clinically significant cervical disease.

Although much has been learned about the natural history of HPV, there is currently a growing focus on type-specific HPV associations rather than data grouped by broad categories (e.g., any oncogenic HPV type), which was common in the past. Randomized clinical trials, for example, are using the incident development of a persistent infection with a vaccine-related HPV type as an intermediate end point (5). Observational studies are also increasingly studying the natural history of HPV on a type-specific basis, whether to compare vaccinated versus unvaccinated women (in nonclinical trial settings) or to investigate other exposures while accounting for possible type-specific differences in effect estimates. The standard logistic regression and Cox model approaches commonly used in HPV research up until now may not always be optimal for such analyses. More efficient statistical methods may need to be adopted.

The choice of statistical method(s) must address several aspects of HPV natural history that are not often encountered elsewhere in infectious disease and cancer research. For example, although coinfection by more than one HPV type is common (3, 6, 7, 8, 9), prior studies suggest that they infect separate cells, causing separate foci of infection (8). Therefore, the prevalence, incidence, clearance/persistence, and progression of each HPV type–specific infection can be examined as distinct events/outcomes. On the other hand, because of shared risk factors, these separate HPV infections may be correlated. The probability of detecting any given HPV type, for example, is greater among individuals who are currently positive for at least one other HPV type (3, 6–9). Another type of correlation in HPV natural history data relates to repeated (serial) testing of the same women over time. If these two kinds of correlations are ignored, it can affect *P* values and confidence intervals (CI), leading to an incorrect statistical inference. On the other hand, statistical models that do not use all the available data (e.g., across all the visits and HPV types examined) may be inefficient.

The current study provides an overview of selected statistical methods that can be used to address these issues. We then illustrate how these methods may be applied to HPV natural history data and compare their results. The example data set is from the Women's Interagency HIV Study (WIHS), a large prospective cohort of HIV-infected and HIV-uninfected women. It is worth noting that these analyses involve more than twice the total person-visits of data available at the time of an earlier report about HPV type–specific infection in HIV-positive women based in the WIHS (8) and represent an update of those earlier findings.

## Materials and Methods

### Subjects

Data were obtained from the WIHS cohort. Details of this cohort have been previously reported (6, 7, 8). Briefly, between October 1994 and November 1995, 2,058 HIV-seropositive and 568 HIV-seronegative women older than 13 y were enrolled in the WIHS from similar clinical and outreach sources in Brooklyn/Manhattan/Bronx, NY; Chicago, IL; Los Angeles and San Francisco, CA; and Washington, DC. In 2002, an additional 738 HIV-seropositive and 406 HIV-seronegative women were similarly enrolled. On a semiannual basis in this ongoing cohort, subjects undergo a structured interview and a physical examination that includes a gynecologic examination. Following a Pap smear, a cervicovaginal lavage is collected for HPV DNA testing, as are blood samples. At the time of this analysis, there was a median follow-up of 14 visits among all subjects, with 9,475 person-years of observation among HIV-positive women and 2,708 person-years among HIV-negative women.

### Laboratory Testing/Data

As described in detail elsewhere (6, 7, 8, 9), HPV testing was conducted at every visit using an MY09/MY11/HMB01 PCR assay. Amplification of a 268-bp cellular β-globin DNA fragment was included in each assay as an internal control. Following amplification, the presence of HPV DNA was assessed using filters hybridized with biotinylated oligonucleotide probes for specific HPV types, as well as a general probe mixture able to detect most anogenital HPV. For this analysis, oncogenic HPV types were defined as types 16/18/31/33/35/39/45/51/52/56/58/59/66.

Blood specimens were tested for CD4^{+} T-cell count and HIV RNA levels in laboratories participating in the AIDS Clinical Trials Quality Assurance Program. CD4^{+} count was categorized into three strata (i.e., >500, 200-500, and <200 CD4^{+} cells/mm^{3}), and plasma HIV RNA level into four strata (i.e., <4,000; 4,000-20,000; 20,001-100,000; and >100,000 copies/mL), a total of 13 combined CD4^{+}/HIV RNA strata that were found to be relevant in prior studies (4, 6), with HIV-negative women as the reference group.

## Overview of Statistical Methods

### Analysis of HPV Prevalence

#### Standard Logistic Regression

It is common in HPV studies to use multivariable logistic regression to cross-sectionally analyze the factors associated with “any HPV,” “any oncogenic HPV type,” or an individual HPV type at a given study visit. In such analyses, each woman contributes a single outcome. Notationally, let *Y _{i}* = 1 represent the detection of HPV16 infection for the

*i*th person at the selected study visit, and 0 otherwise. We model

*P*=

_{i}*P*(

*Y*= 1) by

_{i}*Z*is our exposure variable of interest (e.g., a vector of indicators of 12 separate CD4

_{i}^{+}/HIV RNA strata) and

*W*is the vector of other adjusted variables, including age, race, and lifetime of number of male sex for the

_{i}*i*th person. The primary parameter of interest is

*β*, defined as the log odds ratio (OR) of HPV16 infection between women in a certain CD4

^{+}/HIV RNA stratum and HIV-negative women.

##### Repeated Observations over Time

However, if HPV16 infection were assessed repeatedly over several visits, it would not be appropriate to use model (1.1) to incorporate all the available data: model (1.1) would naively treat repeated observations of each individual over time as if they are independent (i.e., as if they were from different subjects), in essence amplifying the sample size. This can result in underestimation of the variation in the estimate of *β* and its *P* value, as well as a CI that is too narrow (10). To avoid these problems, a generalized estimating equation (GEE; ref. 11) can be incorporated in the logistic regression model.

#### GEE Models

Let *Y _{ij}* represent the HPV16 infection status (0/1) for the

*i*th person at

*j*th semiannual visit, we model

*P*=

_{ij}*P*(

*Y*= 1) by

_{ij}*Z*

_{ij}and

*W*

_{ij}are defined similarly as in model (1.1). The difference with standard logistic regression is in how the parameters are estimated. Specifically, GEE uses an estimating equation to estimate

*β*in which a particular correlation structure (referred to as the working correlation) is assumed between repeated measures of

*Y*. Several choices of working correlations are available in most statistical packages, including “independent” (suitable when a small correlation between data at different visits is assumed) and “exchangeable correlation” (suitable when the correlation between all pairs of observations can be assumed to be similar; e.g., visits 1 and 2 have the same correlation as visits 3 and 14), as well as other working correlations. Often, we do not know how the repeated HPV outcomes are correlated with each other, but a useful feature of the GEE approach is that even if the working correlation is misspecified, the estimate of

*β*is still unbiased. Further, because a robust variance is used, the CI for

*β*is correct for large sample size. The main benefit of properly specifying the correlation between repeated observations is that if the working correlation chosen is close to the actual one, the model has greater efficiency (11, 12). In practice, it is recommended that several working correlations be assessed. The one that produces the smallest SE for the estimate of

*β*is generally accepted to be the one that is most consistent to the true correlation. In many cases, though, the results will be similar across different correlation assumptions. Time-dependent covariates can be incorporated.

In model (1.2), *β* has the same interpretation as that in the logistic regression model [model (1.1)]. Because *β* describes the average risk difference between two groups/populations, it is often referred to as the “population-averaged effect.”

##### Multiple HPV types

GEE models, furthermore, can be used to incorporate data across multiple types of HPV. In a single GEE logistic regression model, we can treat each of the several different oncogenic HPV types as separate end points and then estimate the exposure effect on oncogenic HPV as a whole. We note that this common effect can also be examined with any oncogenic HPV as a single binary outcome. However, greater efficiency is obtained for grouped HPV data because each HPV type separately contributes data to the analysis, and the final result is a weighted average across HPV types. This source of efficiency is in addition to that discussed above, of being able to assess repeated visits over time.

We can also assess HPV type–specific relationships with the exposure variables using GEE models. Compared with conducting separate standard logistic regression models, GEE models still obtain greater efficiency because we reduce the number of covariate coefficients by assuming a common effect for “nuisance variables.” For example, only one “lifetime number of sexual partners” coefficient is required if all individual oncogenic HPVs are studied in the same GEE model, whereas the sexual partners variable would require a separate coefficient in each separate standard logistic regression for each HPV type. Further, GEE models allow us to statistically evaluate if the exposure-disease association is common across types.

To illustrate how GEE models can be used to model multiple types of HPV measured across multiple visits, let *Y _{ijk}* represent the infection of HPV type

*k*for the

*i*th person at

*j*th semiannual visit. To assess an overall effect estimate across oncogenic HPV types, a type-specific baseline prevalence is assumed for each type of infection but a common OR is assumed across the types, the infection rate of

*Y*, is modeled by

_{ijk}, P_{ijk}When sufficient events are available to study each HPV type individually, we can further allow the association with HIV and CD4^{+} to be different across HPV types. In this case, model (1.3) becomes

In this model, the association with CD4^{+}/HIV RNA stratum is modeled by *β _{k}* so that it can vary by HPV type

*k*. Model (1.4) also allows us to test if

*β*s are the same across HPV types.

_{k}Note that two types of correlations exist in the above GEE models: correlation between different types of HPV infection and correlation between the same types of HPV infections at different visits. Standard GEE methods do not easily enable construction of different correlation structures for these two types of correlation (13, 14): only one working correlation structure can usually be used in a single model. Based on prior experience (8), we recommend using either an independent or exchangeable working correlation as an approximation for both correlations.

ORs are a good approximation of relative risk when prevalence rates are low (e.g., <10%; ref. 15), but when prevalence rates are high, ORs overestimate relative risk. Therefore, an additional advantage of the GEE logistic regression approach proposed under model (1.3) is that because it estimates associations “across oncogenic HPV types” (the weighted average of the effects for individual types), the OR is likely to be a good estimate of relative risk (i.e., because the prevalence of oncogenic HPV on a type-specific basis is usually <10% even in high-risk populations). In contrast, any oncogenic HPV as defined under model (1.1) is binary (detection of one or more oncogenic HPV types), which can involve high prevalence in high-risk populations. When a situation arises in which the prevalence of HPV is substantially >10%, direct estimation of prevalence ratios may be preferred to the use of an OR. This can be accomplished in GEE (or mixed-effects models; see below) using log link instead of the logit link used in logistic regression (i.e., ORs), but log link is more likely than logit link to have problems with nonconvergence (16, 17).

Note that there are several links available for studying binary HPV outcomes, such as probit, complementary log-log link, or identity links. Alternative links provide different measures of exposure-disease associations; for example, an identity link provides an estimate of the difference in prevalence (i.e., attributable risk). These other links can sometimes give different results than logit or log links. The decision about which link functions are correct depends on the underlying biological relationship and usually cannot be determined solely on a statistical basis. However, the logit link (standard logistic regression) is the most common link function used in HPV research.

All the above GEE models can be implemented using PROC GENMOD in SAS. Details to implementing model (1.4) are given in Appendix.

#### Mixed-Effects Models

A mixed-effects model (18, 19) is an alternative approach for incorporating repeated HPV prevalence data across HPV types and across visits involving the same woman. A mixed-effects model for repeated assessments of HPV16 infection can be expressed as follows:*μ* + *α _{i}*, where

*μ*is the population average baseline rate of infection and

*α*represents the

_{i}*i*th subject's departure from the population average and is modeled as a random effect. The random effect is unobserved and characterized by a distribution function (typically a normal distribution centered at 0 with variation

*σ*

^{2}), whereas

*Z*and

*W*are observed covariates and referred to as fixed effects. Model (1.5) is therefore called a mixed-effects model. The parameter

*δ*has a different interpretation from

*β*in GEE: it is the log OR of HPV16 infection in a woman in a certain CD4

^{+}/HIV RNA stratum compared with her risk if she were HIV negative. Because

*δ*describes the change in risk within a person were her risk factor level to change, it is often referred to as the “subject-specific effect” (20, 21). In addition to

*δ*and

*η*, the variation in the random effect

*σ*can also be estimated from the model. The variation in the random effect is a measure of the amount of heterogeneity (i.e., the level of intrasubject correlation) in the population; if

*σ*= 0, the model reduces to a fixed-effects model. Under a normal assumption, the variation in the random effect can be interpreted as follows: 95% of the subjects in the population will not depart from the population mean

*μ*by >1.96

*σ*.

Because repeated observations of HPV16 infection share a common *α _{i}*, they are correlated. The simple mixed-effects model (1.5) above specifically assumes an exchangeable correlation structure, analogous to that described for GEE (above); that is, it assumes that any pair of repeated observations of HPV within the same woman (e.g., at visits 1 and 5 versus at visits 3 and 14) has the same correlation. However, other correlation structures can be assumed in mixed-effects models, and in general, mixed-effects models attempt to more accurately model the correlations rather than rely on the use of robust variance as in the GEE models. If the correlation is correctly specified, a mixed-effects model is more efficient than a GEE model. However, if it is not correctly specified, the results can be biased (18).

To incorporate repeated observations across multiple HPV types in addition to repeated observations across multiple visits, a mixed-effects model of *P _{ijk}* (which assumes a common OR across the types) can be defined as

*γ*, like

_{k}*α*, is assumed to be a random effect. This model assumes that correlations among repeated infections of the same type are higher than correlations between different types of HPV, consistent with empirical observations. We can further expand model (1.6) to allow HPV type–specific ORs (details are not provided here). Mixed-effects models for binary outcome are in general more computationally intensive than GEE and can sometimes have difficulty in convergence (i.e., the models cannot be fit and no result is obtained) especially with small data sets.

_{i}The mixed-effects model can be implemented using PROC NLMIXED in SAS.

##### Comparison between GEE and mixed-effects models

Numerically, compared with *β* in the GEE model, *δ* in the mixed-effects model is in general larger in magnitude (20), but because the SE for the estimate of *δ* is also larger, their significance levels (*P* value) are similar. The interpretations of these estimators also differ, and mixed-effects models may be of greater clinical relevance. For example, if a doctor wants to describe to an individual patient how much her risk of HPV16 infection will change if her CD4^{+} count changes, the subject-specific estimator *δ* more directly addresses this. The population-averaged estimator given by the GEE is of greater interest from a public health perspective because it describes on average how two groups of women with different CD4^{+} levels will differ in prevalence of HPV infection. For a more detailed discussion on the comparison of *β* and *δ*, see Hu et al. (22).

Furthermore, on a practical basis, certain types of analyses that can be conducted using mixed-effects models cannot be conducted using GEE. For example, a mixed-effects model but not GEE can be used to contrast rates of HPV infection before compared with after the initiation of highly active antiretroviral therapy using women as their own comparison group, an instance when it is the within-individual comparison that is of interest.

##### Power and sample size

Because GEE and mixed-effects models make use of all available data, they generally produce a more efficient estimate and have greater statistical power than standard logistic regression, which involves a single study visit and a single HPV end point. It is, however, often difficult to determine an exact power estimate for GEE and mixed-effects models. Statistical simulations are sometimes necessary to estimate power for such studies. Here, we discuss a simplified approach, which may be used as a first-order approximation (23, 24). To obtain a “ball park” estimate of power for correlated binary outcomes, we first assume an average correlation (*ρ*) among repeated observations within a subject. If on average each subject contributes *J* observations and there are a total of *n* subjects, then the “effective size” (i.e., the equivalent size if each subject only contributes one observation) is *nJ*/[1 + (*J* − 1)*ρ*]. For example, if *ρ* = 0.3 and there are *J* = 5 multiple observations per subject and 100 subjects, the power based on these 100 subjects is equivalent to the power based on 100 * 5/[1 + (5 − 1) * 0.3] = 227 subjects with only one observation per subject. Standard software packages for power/sample size calculation, for example, PASS (25), can then be used to calculate the power for the latter case (see Results).

### Analysis of Incident Detection and/or Persistence of HPV

#### Standard Cox Models

A standard approach for analyzing time to first incident detection of HPV of a particular type, such as HPV16, is to use a Cox proportional hazards model. The Cox model is defined as follows:*λ*_{0} (*t*) is the baseline hazard function and exp(*β*) is the hazard ratio (HR) of HPV16 infection between women in a certain CD4^{+}/HIV RNA stratum and HIV-negative women while holding all other factors constant. Both the primary covariate *Z* and adjusting covariates *W* can be time dependent (i.e., the values of *Z* and *W* are allowed to change during the follow-up).

However, model (1.7) cannot simultaneously analyze time to first incident detection of several different types of HPV because it does not address possible correlations between incident HPV infections. Instead, it is often more efficient to use methods that can simultaneously assess multiple types of HPV as separate end points in a single model. There are two main statistical methods appropriate for this purpose, as follows.

#### Wei-Lin-Weissfeld Method

Wei, Lin, and Weissfeld (26) developed an approach that can be used to simultaneously analyze time to first incident detection of several types of HPV either in the same or different clinical visits, taking into account possible correlations between the types. Although each HPV type is allowed to have its own baseline hazard function, a common (overall) exposure effect can be assumed in the following Wei-Lin-Weissfeld (WLW) model:*λ*_{0k} (*t*) is the baseline hazard function for the *k*th type of event and exp(*β*) is the common HR across oncogenic HPV types. Similar to the parameter estimate in GEE, *β* has a population-averaged interpretation. Model (1.8) is in essence a Cox model stratified by HPV type and it is estimated under an independent working correlation, with a variance estimator that is robust to possible correlations between events.

Alternatively, if there are an adequate number of end points to study each HPV type separately, we can calculate type-specific HRs using the following WLW model:*β _{k}*) is the HR for incident detection of HPV type

*k*for subjects in a certain CD4

^{+}/HIV RNA stratum relative to HIV negatives.

Model (1.8) has greater power than a standard Cox model because in WLW women contribute multiple time to event end points (one for each HPV type) that are incorporated into the overall effect estimate. Greater efficiency is also obtained in the type-specific analyses using WLW in model (1.9), relative to using separate Cox regression models for each type, because we reduce the number of covariate coefficients by assuming a common effect (i.e., *γ*) for nuisance variables. Further, model (1.9) allows testing of equality in exposure effects across HPV types.

The WLW model can be implemented using the SAS PHREG procedure, selecting the STRATA option to allow different baseline hazards function for each HPV type. A robust variance is requested by choosing the option COVS(AGGREGATE).

#### Frailty Models

An alternative to the WLW method is to use frailty models to analyze multiple incident HPV infections (27). A frailty model introduces a frailty term, *ν*_{i}, to the Cox model as follows:*ν _{i}*, treated as a random effect, represents the

*i*th woman's unobserved individual susceptibility to develop an oncogenic HPV infection relative to the average level in the population. The coefficient

*δ*is interpreted as the log of the HR for any incident oncogenic HPV detection in a woman at a certain CD4

^{+}/HIV RNA stratum compared with her risk if she were HIV negative while holding all other factors constant. Frailty models are a form of mixed-effects model, and the parameter estimate

*δ*has a subject-specific interpretation. Model (1.10) can also be extended to assess HPV type–specific associations (data not shown).

By incorporating *e ^{θk}* in the model, frailty models assume that the baseline hazard functions are proportional to one another (i.e., the hazard functions have similar shape). If instead the baseline hazard function for one HPV type increases with age, whereas the baseline hazard function for another decreases with age, this assumption would be violated. WLW, in contrast, is more flexible in that the baseline hazard function for each infection can be entirely different from each other. In addition, as with other mixed-effects models, frailty models (

*a*) explicitly model the correlation between multiple events and, therefore, can be more efficient than WLW (a marginal model) if the correlation is correctly specified but (

*b*) tend to be more computationally intensive than marginal models, and obtaining model convergence can be problematic, especially with small data sets.

Frailty models can be implemented using a SAS macro %gamfrail.^{9}

##### Clearance/persistence of HPV

HPV persistence can be measured as the time to clearance of a HPV type following its initial detection, with clearance defined as the first subsequent negative result (or preferably two sequential negative results) for that HPV type (6, 8). The WLW and frailty models described above can be used to assess time to clearance of HPV on a type-specific basis (6, 8).

##### Power and sample size

Power and sample size can be estimated for WLW and frailty models using the formula described for GEE and mixed-effects models {i.e., *nJ*/[1 + (*J* − 1)*ρ*], where *J* is the number of HPV types; ref. 28}.

SAS programs for implementing some of the statistical models (above) as well as cautions needed to use these programs are provided on the web.^{10}

## Results

WIHS women have been followed on a semiannual basis. Midinterval (i.e., the midpoint calendar data between two consecutive visits) was used to estimate the time of each incident event. Women were censored for missed visits, and we assumed missing at random. Because our statistical models adjusted for CD4^{+} count and HIV viral load as well as other risk factors, the analyses addressed the main factors that might cause informative censoring in the WIHS cohort. Incident detection of HPV incident detection of type-specific HPV was defined as a positive HPV DNA test result for a specific type in a participant who was negative for that HPV type in all earlier visits (prevalent infections are excluded).

Below we discussed the results for the prevalence data and incidence data.

### HPV Prevalence

We assessed the prevalent detection of HPV16 and other oncogenic HPV types (across all visits) and their relationships with host immune status (i.e., CD4/HIV RNA stratum, defined as a time-dependent covariate) using GEE. The prevalence of HPV16 infection varied by host immune status. For example, HPV16 prevalence [model (1.2)] was significantly greater among HIV-positive women with CD4^{+} count <200 cells/mm^{3} and HIV viral load >100,000 copies/mL compared with HIV-negative women (

GEE model (1.4) was, therefore, used to assess HPV type–specific differences in the effects of host immune status. HPV16 was the referent HPV type, and the ratio of the type-specific ORs was estimated. As shown in Table 1, each of the other oncogenic HPV types had stronger associations with host immune status than did HPV16. For example, the OR for HPV31 was 2.62-fold

We then compared these GEE results with those obtained using a mixed-effects model [model (1.5)]. As expected, the mixed-effects model gave much higher estimates of association, with wider CIs than did GEE (Fig. 1) but similar *P* values (*P* < 0.001). For example, the OR [i.e., ^{+} count <200 cells/mm^{3} and HIV viral load >100,000 copies/mL (the most immunosuppressed stratum) relative to being HIV negative was 9.22 (95% CI, 4.84-17.6). This estimate indicates that if an individual HIV-negative woman were to become HIV positive and her immune status dropped to the most immunosuppressed stratum, her risk of having HPV16 infection would increase ∼9-fold. In contrast, the OR [i.e.,

#### Statistical power

We use data from Strickler et al. (8) to illustrate the gain in power with GEE compared with standard logistic regression. Based on 535 women with a CD4^{+} count <200 cells/mm^{3} and 458 women with ≥500 cells/mm^{3}, Fig. 2 shows the power to detect OR = 2.0 for the association of HPV prevalence and CD4^{+} count with (*a*) only one observation (e.g., one visit with one HPV type), *J* = 1 (i.e., standard logistic regression), versus (*b*) *J* = 5 or *J* = 10 (based on GEE) per subject under various assumptions about the prevalence rate of HPV infection. In our example, the correlation between repeated observations across HPV types and visits was assumed to be either 0.3 (moderate) or 0.6 (high). As shown in Fig. 2, the power when *J* = 5 is substantially larger than the power without repeated observations (*J* = 1). The power when *J* = 10 is only slightly higher than the power with *J* = 5, indicating that although there is a benefit to statistical power with a certain number of repeated observations, there is a diminishing effect on power as the number of repeats increases. Further, the increase in power is greater when the correlation is lower (e.g., *ρ* = 0.3 versus *ρ* = 0.6). This is as expected because the level of correlation is inversely related to effective sample size. Intuitively, when the correlation between repeated observations is low, having additional repeats should provide substantial new information, whereas having a high correlation implies that the repeated observations are not greatly independent of one another and therefore the repeats provide limited additional information. If the correlation is unknown, a conservative estimate should be used to calculate power/sample size.

### Incident HPV Detection

A WLW model of incident oncogenic HPV detection by type [model (1.8)] showed that women with a CD4^{+} count <200 cells/mm^{3} and HIV RNA >100,000 copies/mL had significantly increased risk of incident oncogenic HPV detection (^{+} count than HPV16.

Lastly, we compared WLW and frailty model [model (1.10)] results. Figure 3 shows the findings of an analysis involving the incident detection of seven common oncogenic HPV (i.e., HPV16/18/31/33/35/45/58) selected because the frailty model had problem converging when less common HPV types were analyzed (most likely because of insufficient occurrences to define their baseline hazards). Interestingly, unlike the comparison of mixed-effects and GEE models, effect estimates obtained using the frailty model were only slightly larger than those obtained using WLW. Frailty models to assess type-specific associations failed to converge.

## Discussion

This study shows that marginal and mixed-effects models are appropriate, efficient methods for the analysis of HPV natural history data. Although it has been common to use standard logistic regression for the analysis of the prevalent (cross-sectional) detection of HPV, and standard Cox models for time to incident detection (or clearance) of HPV, these common approaches do not make use of all the available data. That is, standard logistic and Cox regression models can only consider a single HPV type, and logistic regression can also only consider a single visit or a single time period. Marginal and mixed-effects models, in contrast, can simultaneously address multiple oncogenic HPV types as well as multiple visits over time. As shown in study, they consequently have greater statistical power in the analysis of HPV natural history data. In addition, these models allow comparison of exposure-disease associations between HPV types. Thus, marginal and mixed-effects models could have many useful applications to HPV research both in randomized clinical trials and in epidemiologic studies.

There are, however, important differences in the estimates obtained from marginal and mixed-effects models. Our study examined the major current forms of these models relevant to the analysis of HPV data. The marginal models were GEE for the analysis of HPV prevalence and WLW for the incident detection or clearance of HPV. Marginal models estimate the average relative difference in risk between two groups of subjects with different levels of risk factor exposure. This is often referred to as the population-averaged effect. The mixed-effects models examined were frailty models for analyzing the incident detection (or clearance) of HPV and mixed-effects models for analyzing HPV prevalence. Mixed-effects models estimate the change in risk that would occur within a person were risk factor exposure to change. This is often referred to as the subject-specific effect.

The effect estimates obtained using mixed-effects models tend to be considerably larger than those obtained using marginal models but, because the SEs in mixed-effect models are also larger, the significance levels obtained are often similar in both types of models. Interestingly, in our data set, we found that the results of GEE and mixed-effects models had these expected differences, but the WLW and frailty models provided very similar effect estimates. Close agreement between WLW and frailty models in our data may reflect small correlation between the incident detections of different types of HPV, whereas the large differences observed between the GEE and mixed-effects model estimates likely reflect strong correlations between the cross-sectional detection of the same HPV types over serial semiannual visits (i.e., the high frequency of persistent and recurrent infections).

As an important practical matter, mixed-effects models are much more computationally intensive than the marginal models and are more likely to have problems with convergence (i.e., mixed-effects models require more extensive data). For this reason, marginal models may be seen as having an advantage over mixed-effects models in many clinical trials and epidemiologic studies, or whenever data are limited. As mentioned, though, certain types of analysis require mixed-effects models. In general, we recommend a biostatistician be involved when implementing and interpreting results from mixed-effects models.

Although not our main focus, this study also served to further show that the prevalent and incident detection of HPV16 (the HPV type that accounts for half of all cervical cancers) is more weakly associated with host immune status than other oncogenic HPV types, as we and others have previously reported (8). The relative independence of HPV16 infection from host immune status has been interpreted as evidence that HPV16 may be better able to avoid the effects of immune surveillance than other HPV types and that this ability might help partly explain the predominance of HPV16 in cervical disease. The current study represents an update of the prior data involving more than double the person-years of observation available at the time of our earlier report, and approximately a third of the women were new to the analysis.

In summary, although the statistical methods discussed in this study are well established and highly efficient, their application to HPV research has been limited (29). To our knowledge, this is the first formal discussion of these methods in the context of HPV natural history data. However, even if the greater use of these methods were to represent an advance, the analysis of the natural history of HPV will continue to be a challenge. It is hoped, therefore, that this study will not only spur interest in current marginal and mixed-effects models but will additionally interest biostatisticians in the development of new and better methods for the analysis of HPV natural history data.

## Appendix

Model (1.4)_{ijk} for *k* = 1, …, *K*) for each of the other oncogenic HPV types (with HPV16 as a reference) and their interactions with CD4^{+}/HIV RNA stratum in the model. The model becomes*β* represents the relationship of HPV16 detection with host immune status and *ϕ _{k}* represents the difference between HPV type

*k*and HPV16 in relation to host immune status (i.e., the log ratio of the ORs). Type-specific analyses for other models (mixed-effects, WLW, frailty), such as model (1.9), are implemented using the same approach.

## Disclosure of Potential Conflicts of Interest

The authors do not have a commercial or other association that might pose a conflict of interest.

## Acknowledgments

**Grant Support:** HPV DNA testing is funded through R01-CA-085178. All specimens and other data in this study were collected by the WIHS Collaborative Study Group with centers (principal investigators) at New York City/Bronx Consortium (Kathryn Anastos, M.D.); Brooklyn, NY (Howard Minkoff, M.D.); Washington, DC, Metropolitan Consortium (Mary Young, M.D.); The Connie Wofsy Study Consortium of Northern California (Ruth Greenblatt, M.D.); Los Angeles County/Southern California Consortium (Alexandra M. Levine, M.D.); Chicago Consortium (Mardge Cohen, M.D.); and Data Coordinating Center (Stephen J. Gange, Ph.D.). The WIHS is funded by the National Institute of Allergy and Infectious Diseases with supplemental funding from the National Cancer Institute and the National Institute on Drug Abuse (U01-AI-35004, U01-AI-31834, U01-AI-34994, U01-AI-34989, U01-AI-34993, and U01-AI-42590). Funding is also provided by the National Institute of Child Health and Human Development (U01-HD-32632) and the National Center for Research Resources (M01-RR-00071, M01-RR-00079, and M01-RR-00083).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked *advertisement* in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

## Footnotes

↵

^{9}Available from: http://www.biostat.mcw.edu/software/SoftMenu.html.- Received June 10, 2009.
- Revision received October 27, 2009.
- Accepted November 3, 2009.