Abstract
The longitudinal stability of the urea breath test (UBT), which measures urease as a biomarker for infection with Helicobacter pylori (a major risk factor for gastric cancer), was evaluated in the environs of Tsukuba, Japan. ^{13}CUBT measurements were monitored at four time points in 46 freeliving, H. pylori–infected, asymptomatic volunteers over a period of 7 weeks. Subjects were asked to refrain from eating cruciferous vegetables, which might confound interpretation of results. Their compliance was monitored using both dietary records and direct biochemical testing of overnight urine. There was large betweensubject UBT variation in this population (logUBT mean, 3.34; SD, 0.67). Withinsubject (longitudinal) UBT values were remarkably stable in about onequarter of the subjects (coefficients of variations for these individuals were <21%), whereas coefficients of variations in the highest quartile of variability ranged from 40% to 80%. About half of the sequential UBTs (63 of 138 such measurement pairs) changed >10‰ “delta over baseline” between measurements. This study provides the elements to optimize the design of a clinical trial in this population to examine the efficacy of a dietary intervention to reduce H. pylori infection. The number of subjects required to detect a 30% difference in average UBT value is highly dependent on the baseline stability of UBT measurements. For the least variable quartile, as few as 12 subjects would be needed; for the most variable quartile, at least 147 subjects would be required in each arm.
Introduction
Helicobacter pylori is recognized by the WHO as a class I carcinogen and is a causative agent for gastric carcinogenesis. Gastric cancer is the second leading cause of worldwide cancer mortality, with >875,000 cases in 2000 and ∼650,000 deaths (1). More than half of the world's population is infected with H. pylori, and although the vast majority of infected individuals never develop gastric cancer, for those who do, the risk attributable to H. pylori infection is high. Recent estimates are that ∼0.5 million new cases of gastric cancer yearly (about 55% overall) are directly attributable to infection with H. pylori (2). Japan has a substantially higher incidence of gastric cancer than any other region of the world (3). Thus, the ability to lower the prevalence of H. pylori infection and the consequent risk of cancer is of compelling importance.
The recent demonstrations of both potent in vitro and in vivo antibiotic activities of sulforaphane (4methylsulfinylbutyl isothiocyanate) against H. pylori (4, 5) suggest the possibility of ameliorating the effects of this organism by treatment with a sulforaphanerich dietary supplement. Although in prior studies the efficacy of dietbased therapies was verified by invasive techniques such as gastric biopsy (e.g., ref. 6), validation in larger clinical trials requires biomarkers obtained through noninvasive testing. The urea breath test (UBT) described below is such a biomarker. Several studies that depended solely on this test to estimate the degree of H. pylori colonization have produced inconsistent results. Whether the variation in repeated measurements over time was due to inherent biological differences or to imprecision in the test was unclear.
The UBT is often considered the “gold standard” of noninvasive methods for diagnosis of H. pylori infections. It measures the activity of urease, an enzyme that is abundant in H. pylori but is absent from human tissues. Urease hydrolyzes urea, an end product of mammalian nitrogen metabolism, giving rise to CO_{2} and NH_{3}. By neutralizing gastric acidity, NH_{3} is believed to allow H. pylori to survive on the gastric mucosa. Hydrolysis of a dose of administered ^{13}Curea by the infective organism gives rise to ^{13}CO_{2} in the breath and the difference in the ratio of ^{13}CO_{2} to ^{12}CO_{2} after a standard dose of ^{13}Curea provides a measure of the urease activity and, according to some reports, the severity of infection. The measurements are usually expressed as the changes in ^{13}C/^{12}C ratios in the exhaled air (in parts per thousand or ‰) from the baseline levels to a specific time point after dosing. These changes are customarily expressed as “delta over baseline” (DOB). A typical dose containing 75 to 100 mg of >99%^{13}C urea is administered to fasting subjects in 100 mL of orange juice or citric acid, which assure gastric acidity and inhibit gastric emptying. Although some oral bacteria also exhibit urease activity, most common gastric bacteria do not (7).
The UBT has been used extensively to diagnose H. pylori infection, identifying infected individuals but not quantifying their level of bacterial colonization. There is welldocumented correspondence between UBT values and prevalence of infection on a population basis. Several studies have shown good correlation of UBT values with H. pylori colonization based on histology scores (810) and bacterial culture and serology (11, 12). Whereas inflammation did not correlate with UBT score (13), there was a significant relationship between UBT and endoscopic findings of (a) magnitude and extent of H. pylori colonization in both corpus and antrum as determined by histology, (b) neutrophil accumulation in the antrum, (c) atrophy of the corpus, and (d) intestinal metaplasia of the corpus (n = 169 subjects; ref. 14). In addition, there was excellent correlation between the number of H. pylori genomes per picogram of paired antrum and corpus in DNA from human biopsies (n = 88 subjects; ref. 15).
The reliability of the test as a biomarker of colonization density at the individual subject level remains uncertain. For studies that follow the longitudinal effects of treatment over time, it is essential to know whether measurements remain stable and reproducible. This is particularly important in determining the effectiveness of a treatment in which only a modest antibiotic effect is expected, for example, from a dietary intervention. Normal physiologic or measurement variation might mask this effect. Therefore, data collected in the present study were used to determine the variability of UBT in a Japanese population in which there is a high incidence of H. pylori infection to plan the most appropriate trial for treatment of this widespread risk factor. We have thus evaluated the variance components of UBT measurements within and between individuals. We have assessed the factors associated with the different components of variance and present evidence to guide the design of clinical trials by selection of individuals whose withinindividuals variance falls within a predetermined range.
Materials and Methods
Subjects
The subjects in this investigation comprised the control population of a larger study of the effect of dietary intervention in subjects infected with H. pylori. Fortysix volunteers were studied in whom H. pylori colonization was verified by a urinary H. pylori antigen test (Rapirun, Otsuka Pharmaceutical Co., Ltd., Tokyo, Japan; ref. 16) or, for five subjects, by UBT. Subjects were followed for 7 weeks according to the protocol described in Fig. 1, and had a total of four UBTs, on days 0 (UBT1), 7 (UBT2), 21 (UBT3), and 49 (UBT4). The subjects were not hospitalized, and compliance with the dietary restrictions of the protocol was assessed by questionnaires and by biochemical verification as described below. Demographic information obtained for each participant was supplemented by queries on smoking and alcohol use, selfreported gastrointestinal symptoms, and prior endoscopy. Data on smoking and drinking status presented in Table 1 pool respondents classifying themselves as both light (120 cigarettes per day) and heavy (≥21 cigarettes per day) smokers and pools both light (≤2 drinks per week) and heavy (≥3 drinks per week) alcohol drinkers.
Urea Breath Test
The original test developed in 1987 used mass spectrometry for measurement of ^{13}C/^{12}C ratios (17). Isotopeselective IR spectroscopy has made the test available more widely in doctors' offices with the use of portable IR spectrophotometers (1820). For this study, an UBiTIR300 was used, which had a limit of detection of 0.3‰ DOB. Fasting subjects provided baseline breath samples in 300 mL aluminized plastic collection bags (Otsuka Electronics Co., Ltd., Tokyo, Japan) and then quickly swallowed a 100 mg tablet of ^{13}Curea (Yubitto). A second breath sample was taken 20 minutes after tablet ingestion and measurements were made in strict accordance with the manufacturer's instructions.
Test Protocol
Subjects provided a total of four UBTs (on days 0, 7, 21, and 49 of the study; see Fig. 1). They were enrolled on a rolling basis between January and March 2003, starting with a 3day runin period during which no cruciferous vegetables were consumed. On days 1 to 21, subjects were given 30 g of raw alfalfa sprouts three times daily as a noncruciferous vegetable supplement to be eaten in place of their normal vegetablerich diet; no supplements were given on days 22 to 49. To assess adherence to a prescribed diet, questionnaires assessed compliance at each of the 64 meals on days 1 to 21 (3 × 21 = 63, plus one extra meal), with a maximum possible score of 192 points [100% compliance was awarded 3 points, 2 points were given for consumption of between 50% and 100% of the assigned vegetable, and 1 point was given for consumption of less than half of it].
Verification of Compliance
Overnight urine samples were collected immediately before each of the UBTs and aliquots were sent to Baltimore for biochemical verification that subjects were maintaining a cruciferfree diet by using a cyclocondensation assay to detect isothiocyanates and their metabolites (2125). Briefly, 500 μL aliquots of urine were added to 500 μL of 500 mmol/L sodium borate buffer (pH 9.25) and 1.0 mL of 20 mmol/L 1,2benzenedithiol in acetonitrile and incubated for 2 hours at 65°C. After cooling to room temperature and lowspeed centrifugation to sediment any precipitates, a 200 μL aliquot of the supernatant fluid was injected automatically (Waters Autosampler, Model 717 Plus, Waters Co., Milford, MA) onto a Partisil 10 ODS2, 4.6 × 250 mm, reverse phase HPLC column (Whatman, Clifton, NJ) and eluted with 80% methanol/20% water (v/v) at a rate of 2 mL/min. The 1,3benzodithiole2thione formed in this reaction was eluted at ∼5 minutes, and the peak was detected by a photodiode array detector set at 365 nm (PDA Model 996, Waters) as described by Ye et al. (23).
Statistical Analysis
Graphical depiction of the 184 (46 × 4) UBT values (Fig. 2) suggested a lognormal distribution as a reasonable probability model for the observed data. Let Y_{it} denote the logarithm of the UBT measurement taken at time t (for t = 0, 7, 21, and 49 days) in the ith individual (i = 1, 2, …, 46). We denote the withinindividuals mean and SD by m_{i} and S_{i}, and these were the mean and SD of the four measurements in the ith individual, respectively. To graphically depict the components of variance, we plot m_{i} versus S_{i}, so that the overall average of the m_{i} represents the location of the population, the variability of the m_{i} represents the betweenindividuals SD (B), and the average of the S_{i} represents the estimate of the withinindividuals SD (W).
To characterize the components of variance of the four UBT measurements in the 46 individuals comprising the study population, we used standard methods for twoway ANOVA. As a summary of the variance components, we provided the ratio of the within SD to between SD (W/B) and its corresponding withinindividuals correlation 1/(1 + (W/B))^{2} = B^{2}/(B^{2} + W^{2}). To test whether age, sex, smoking, and alcohol consumption had effects on the components of variance and the mean of the UBT measurements, we used likelihood ratio tests to compare the two groups by sex, smoking (yes/no), alcohol consumption (yes/no), and age (older/younger than 50 years).
The estimates of W and B provided by our study are key inputs for the planning of clinical trials. In the simplest example of a treatmentplusplacebo clinical trial with premeasurements and one postmeasurement, let X_{0i} and Y_{0i} denote the pre and post values of the ith individual in the placebo group and let X_{1j} and Y_{1j} represent the corresponding values for the jth individual in the treatment group. If the analysis of the trial compares the means of the Y's, the sample size required for each group to have 80% power to detect the difference between the mean of Y_{1j} (i.e., μ_{1}) and Y_{0i} (i.e., μ_{0}) at the 95% confidence level is given by n = 2(B^{2} + W^{2})(1.96 + 0.841)^{2}/(μ_{1} − μ_{0})^{2}. However, the strength of the design is to be able to use each individual as his/her own control by comparing the differences D_{0i} = Y_{0i} − X_{0i} and D_{1j} = Y_{1j} − X_{1j}. In this case, the sample size formula reduces to 2(2W^{2})(1.96 + 0.841)^{2}/(Δ_{1} − Δ_{0})^{2}, where Δ_{1} and Δ_{0} are the means of the D_{1}'s and D_{0}'s, respectively. This formula follows from the fact that Var D_{0i} = Var D_{1j} = (B^{2} + W^{2}) + (B^{2} + W^{2}) − 2ρ(B^{2} + W^{2}) = 2W^{2}, since ρ = B^{2}/(B^{2} + W^{2}).
To guide the planning of a clinical trial, sample size calculations were thus based on the equation: n = 4(Z_{1 − α/2} + Z_{1 − β})^{2}(W)^{2}/D^{2}; W was assumed to be the same for both groups (e.g., placebo and treatment) being compared and D is defined as the detectable difference between those two groups' measures.
Statistical analyses were done using both SAS (version 8, 1999; SAS Institute, Inc., Cary, NC) and Stata (version 7, 2001; Stata Corp., College Station, TX). Specific comparisons are described in Results.
The conduct of this trial was approved by the appropriate institutional review boards at both Tsukuba University and Johns Hopkins University.
Results
We assessed the withinsubject and betweensubject variance in 46 individuals not taking antibiotics, in a target population for a dietary intervention. Four measurements of UBT were made over a period of 7 weeks (called UBT1, UBT2, UBT3, and UBT4, respectively). Subjects were administered supplementary green vegetables (alfalfa sprouts) which they consumed for 21 days, beginning after UBT1 and ending at the time of UBT3 (Fig. 1). Average age of the 46 subjects was 52 years (range, 2277 years) and 27 subjects (59%) were female. Descriptive statistics of their UBT and urine measurements (an index of dietary compliance), are given in Table 1. Raw UBT scores ranged from 4.3‰ to 161.7‰, with an overall mean of 36.5‰. Mean scores for UBTs 1 to 4 were 39.6‰, 35.4‰, 33.5‰, and 37.6‰ DOB, with coefficients of variation of 77.6%, 63.7%, 79.1%, and 65.2%, respectively. In addition, Table 1 presents the descriptive statistics of the UBT measurements in the log scale. This transformation was suggested by the shape of the distribution of scores for the pooled data from UBT1 to UBT4, which was consistent with a lognormal distribution (Fig. 2); thus, the statistical analysis was conducted on logtransformed data.
Table 1 also shows the compliance data. Compliance of subjects with the study protocol (abstention from cruciferous vegetables) was excellent: levels of dithiocarbamates measured in overnight urine collections (biochemical markers of consumption of cruciferous vegetables) averaged between 1 and 2 μmol per person. These values were entirely consistent with selfreported consumption in which scores ranged between 165 and 192 out of 192 possible points (reflecting complete adherence to the dietary protocol over all 64 meals). There were no significant differences overall between any of the four UBT scores when evaluated by paired t tests on logtransformed data, direct comparisons, quartileparsed data, the sums of differences in UBT values, the sum of the absolute values of the differences, or the sum of absolute values of differences as a fraction of the mean.
Figure 3 depicts the withinindividuals means and SDs. The overall mean (depicted with a large arrow on the x axis) was 3.34 and the SD of the individual means (i.e., betweenindividuals SD) was 0.67, which is depicted by two arrows located at +1 SD and −1 SD from the overall mean. Notably, there was a striking difference among subjects in the stability of their UBT measurements as indicated by the location of the open circles along the y axis. About half (63 of 138) of the pairs of sequential UBT measurements across all subjects and all times changed by >10‰ between measurements. Partitioning the study population according to the quartiles of the withinindividuals SDs (i.e., quartiles of the S_{i} for 1 ≤ i ≤ 46), Fig. 4 illustrates the trajectories of UBT (log scale). Figure 4A provides the trajectories of the 11 individuals in the first quartile who had a remarkably stable UBT profile over four measurements spaced across 7 weeks. Coefficients of variations for these subjects were <21% (Fig. 4A) compared with coefficients of variations between 40% and 80% for the highest quartile (Fig. 4D).
Table 2 shows the means and variance components for the overall group as well as for two strata according to age, sex, smoking, and alcohol consumption. Calculation of omnibus P's showed that, although there was no difference in means between male and female subjects (P for homogeneity of means = 0.240), the variance components for female subjects were significantly higher than those of males (P for homogeneity of variances = 0.018; Table 2). Likewise, there was no difference in means between older and younger subjects (P for homogeneity of means = 0.660) but a marginally increased betweenindividuals variance in older subjects (P for homogeneity of variance = 0.086). Neither smoking nor alcohol consumption had any impact on mean values or variance components.
Outcomes obtained in the present trial (Tsukuba) are contrasted in Table 3 with those in three previously published trials. Using the data from the 46 subjects in the present trial as the basis for projection, we calculate that, to have 80% power to detect a difference (D) of 30% (D = log 1.30 = 0.26) from the mean between treated and control groups, a sample size of n = 66 in each arm would be needed to conclude that there was a difference at the 95% confidence level (Table 3).
This sample size falls within the range that one can calculate based on some of the best test/retest data available in the literature (Table 3), but our data are unique in that they permit assessment of the variability for UBT measurements within individuals. Published test/retest data have used two or at best three repeated measurements on an individual. Computation of required sample size based on the lower compared with the upper quartile of variability in our subjects suggests that as few as 12 or as many as 147 subjects, respectively, per treatment arm, would be required for similar power to detect a difference (Table 3). Thus, prescreening of subjects for stable UBT baseline has great bearing on the sample size required.
Discussion
The present study provided data to determine the reproducibility of the UBT over time in a population known to be infected with H. pylori, with a view to testing the effects of dietary intervention on reducing colonization with these organisms. Although Blaser and Berg (26) have highlighted reasons why universal eradication may not be indicated, it is clear that H. pylori eradication in certain highrisk groups will reduce the risk of gastric cancer. Very few dietary interventions have monitored the level of H. pylori by the UBT. Notably, those studies comprised very small numbers of subjects and, perhaps not surprisingly, reported lack of effect. For example, the following numbers of subjects were studied: (a) 15 treated subjects and 8 untreated controls with single before and after treatment UBTs (27); (b) 5 treated subjects with two baseline and one or two followup UBTs (28); and (c) 5 subjects with only single before and after treatment UBTs (29). Examination of the data in light of the variability we present here leads to the conclusion that their experiments did not have the statistical power to identify an effect. Such underpowering has been highlighted as a critical inadequacy in the conduct of randomized trials (30, 31), even when pilot samples are used in sample size determination (32). Indeed, one of the developers of the UBT assay (P.D. Klein) recently cautioned (19):The simplicity of the breath test method is seductive if one could distinguish between two groups or two populations on the basis of a difference in the recovery of labeled CO_{2}, valuable diagnostic information should have been generated. Unfortunately, this belief is often unsupported by external comparisons…Good statistical design considerations point to a study size of at least 60 individuals in each category or a total of 120 to 150 comparisons to the standard.
This problem was also recently highlighted by Marshall et al., who questioned whether the UBT was able to assess the intragastric bacterial load even semiquantitatively. His group concluded no effect based on the individual data of nine subjects given UBTs before, during, and after lactoferrin therapy (33).
Our studies showed that there is variability in longitudinal measurements in some subjects. With only a small change in colonization, even in the subjects with the most stable baseline measurements, the number of subjects required would be large. Indeed, in studies of the most commonly examined dietary approach to H. pylori treatment, formulations containing Lactobacillus sp., demonstration of a significant effect of intervention required a much larger sample size than has been used in most such trials. With a sample size of 326 asymptomatic children, Cruchet et al. (34) have shown a significant effect of daily ingestion of Lactobacillus johnsonii LA1 for 4 weeks. Two other studies serve to illustrate the point that development of robust baseline information on the subject pool is critical to conducting a conclusive intervention. In the first study, Sakamoto et al. (35) followed 29 subjects with three UBTs spread over 19 weeks—the first two designed to establish baseline values and the third designed to reveal the effects of Lactobacillus gasseri intervention. Although their mean values were virtually identical, close examination of individual subjects' values reveals dramatic fluctuations in many of them during the baseline phase of the study. In the second study, Lactobacillus casei treatment of 14 H. pylori–colonized subjects (and six controls) was followed with single before and after treatment UBTs (36). Neither of these studies showed convincing results due to low subject number and high interindividual variability.
To be meaningful from a public health perspective, the ability to discriminate between quartiles or quintiles of H. pylori density in an infected population is a reasonable objective. The large differences in variance that we observed would make detecting such a difference in the sample sizes used in most dietary eradication studies problematic. There are several potential explanations for these large differences, including the underlying pathology of H. pylori infection in which there may be physiologic perturbations that are detected by the biomarker (UBT) yet not reported as a symptom by the patient (e.g., mild gastritis) or detected on examination. Colonization with multiple H. pylori strains may result in competition for niches within the stomach or in evolution to colonize those niches more efficiently. None of the additional data collected in this study (e.g., smoking status, age, and sex) could account for the variation we saw.
We have thus presented a longitudinal series of four UBT measurements made over a 7week period in 46 freeliving, asymptomatic, H. pylori–infected individuals and have generated statistical descriptors for that population. We are not aware that such test/retest data are publicly available. This information now permits the design of dietary interventions in this population in which absolute eradication of H. pylori is not a required end point. Although evidence indicates that it would be feasible to preselect individuals with stable baseline UBT measurements by performing three or four temporally spaced UBTs and thus reduce the required sample size, much caution is required. Such preselection may well carry with it a bias toward tolerance to H. pylori colonization, so that the expected change due to an intervention in these individuals may be of diminished magnitude, reducing the perceived advantage of such an approach.
Acknowledgments
We thank Dr. Masayuki Yamamoto for facilitating this study and for providing seed funds to assist in its startup; Murakami Noen (Hiroshima, Japan) for producing and delivering all of the sprouts prescribed for ingestion in this study, for donating two UBiTIR300 machines to the Tsukuba University Medical School, and for providing all of the stable isotope (^{13}Curea) that was required for this study; the AICR and the Brassica Foundation for funding; Naoko Itagaki and Mamoru Amagai for helping with the collection of urine samples and for assisting with UBT and data collection; Katherine Stephenson for assisting with laboratory analyses; Shintaro Kamo for growing and distributing the sprouts; Lisa Jacobson for providing initial statistical consultation; Pamela Talalay for providing editorial assistance; the members of Citizen Oriented Health and Medicine Network, many of whom volunteered to participate in this study; and the numerous volunteers whose interest in helping with H. pylori research motivated them to participate in this study.
Footnotes

Grant support: World Cancer Research Fund, American Institute for Cancer Research, Barbara Lubin Goldsmith Foundation, Brassica Foundation, and Lewis B. and Dorothy Cullman Foundation.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

Note: J.W. Fahey and A. Yanaka contributed equally to this study. Two of the authors (J.W. Fahey and A. Yanaka) and Johns Hopkins University are founders of Brassica Protection Products, a company with a mission to develop chemoprotective food products and that sells broccoli sprouts. They (and Johns Hopkins University) are also stockholders as well as scientific consultants to Brassica Protection Products. Their stock is subject to certain restrictions under university policy. The terms of this arrangement are being managed by Johns Hopkins University in accordance with its conflict of interest policies.
 Accepted May 3, 2004.
 Received February 19, 2004.
 Revision received April 22, 2004.