To the Editors: We have read with interest the ongoing correspondence mainly between Kristal and Potter on the one hand and Willett and Hu on the other. Because the correspondence has touched on several of our publications, we would like to clarify some points raised by Willett and Hu, particularly in their most recent letter (1).
Willett and Hu discuss the virtues of food records as reference measurements to validate food frequency questionnaires (FFQ) because the cognitive processes involved in completing a food record (recording in real time) versus a FFQ (remembering past intake) are very different, and therefore, “correlated errors” are minimized. There are no published data indicating that a food record is superior to other self-reported reference instruments, such as multiple 24-h dietary recalls (which rely on memory), for validating FFQs. However, even if Drs. Willett and Hu's assumption about there being less cognitively related correlated error between a FFQ and a food record is correct, there may be a more overwhelming type of correlated error (i.e., correlated person-specific bias) that comes to bear in FFQ validations related to bias in all self-reported dietary assessment methods. For example, individuals who tend to represent their diets as healthier than they actually are on a FFQ are also more likely to do so on a record or a recall. The Observing Protein and Energy Nutrition study showed substantial positive correlations between person-specific biases in the FFQ and two 24-h recalls; these correlations would tend to give a falsely high estimate of the correlation between the FFQ and true intake (2). We doubt that such biases would be substantially less for food records, given that records are reactive by nature and cause individuals to modify intake on reporting days.
Willett and Hu criticize the “conservative assumption of 0.7” for within-person correlation of the doubly labeled water biomarker made by Kristal and Potter in their counterpoint (3). They state that “results presented by Neuhouser et al. (4) based on findings from the Women's Health Initiative indicated that the within-person correlation for energy adjusted or absolute protein intake assessed by repeated urinary nitrogen and doubly labeled water measurements was less than that of 0.4.” In truth, the within-person correlations presented at the said conference were 0.43 for urinary nitrogen and 0.68 for doubly labeled water.5
In the Discussion section of our article, on the analysis of breast cancer risk in the Women's Health Initiative diet trial control group (5), we showed good agreement between the reduction in breast cancer risk predicted from our analysis (8.8%) and the observed reduction in the intervention group (9.1%). Willett and Hu claim that if our prediction had been adjusted for attenuation resulting from the use of just 4 days of diet records, then it would have been seen to be much higher than the 8.8% and consequently inconsistent with the trial results. However, they do not account for the difference in underreporting between the intervention and control groups reported in the same lecture of Neuhouser et al. (4) as mentioned above. Such differential misreporting could easily have exaggerated the reported gap in fat intake of the intervention and control groups, on which our predicted reduction of 8.8% was based, in which case the prediction would have to be adjusted downward. Furthermore, our prediction was based on a multivariate model, and the exact level of attenuation that would occur is quite unclear in the absence of exact knowledge about the measurement error model for reported fat. In summary, our results are indeed quite consistent with the trial results, within the bounds of our current knowledge.
Willett and Hu claim that the estimated relative risks for breast cancer that we reported (5) for total fat intake were not very different when using the FFQ (1.71; 95% confidence interval, 0.70-4.18) compared with using the food record (2.09; 95% confidence interval, 1.31-3.61). In our article, we discussed in detail the nature of the evidence for our claim that the food record seemed to be more powerful than the FFQ for detecting the breast cancer relationship. We stated “The statistical significance of the relative risks for the food record together with the non-significance of the selection-adjusted relative risks for the FFQ seen in Table 5 does not in itself allow the conclusion that the food record is the more powerful instrument. More relevant to the direct comparison of the two instruments is the relative size of the relative risks in Table 5 (as opposed to their significance) and also the comparison of the standardized log relative risks in Table 6. In this respect we see a uniform pattern where the values are larger for the food record than for the FFQ. The increase in standardized log relative risk for the food record over that of the FFQ in Table 6 is close to conventional statistical significance for total and polyunsaturated fats. In an alternative analysis for handling missing data, this increase reached statistical significance for all types of fat considered.” In fact, that alternative analysis yielded the following estimated relative risks in the fourth and fifth quintile of reported total fat intake: for the food record, 1.86 and 2.54, respectively, and for the FFQ, 1.06 and 1.24, respectively. Thus, our conclusions are based on several different analyses, not just one part of the evidence.
Willett and Hu claim that the results of our Women's Health Initiative study (5) are inconsistent with those of Bingham et al. (6) because we found a significant effect for unsaturated fats, whereas Bingham et al. found a significant effect for saturated fat. Both studies, however, found a significant effect for total fat when using a food record. In our analyses, we observed statistically significant increased risks for polyunsaturated and monounsaturated fat intakes, whereas the relative risks for saturated fat were elevated but did not reach statistical significance. As stated in our Discussion section (5), we do not view these findings as discordant with those of the Bingham group. The relative risks for saturated fat were in the same direction as for unsaturated fat and were not qualitatively different from them. Moreover, in the alternative analysis that accounted for missing data, described at the end of our Results section, statistically significant effects were seen also for saturated fat.
Surprisingly, Willett and Hu write that “animal studies of breast carcinogenesis designed to distinguish between the effects of energy balance and dietary fat have not supported an effect of dietary fat,” citing an experiment reported by Beth et al. (7) and published in 1987. They do not cite a meta-analysis of 100 animal experiments (including Beth et al.'s report), studying the same question, which came to the opposite conclusion (8).
We do not conclude, however, that FFQ-based findings should be ignored nor that the FFQ should be “abandoned.” Even in the face of substantial measurement error “noise,” true diet-cancer “signals” may be detected in FFQ-based studies, especially those characterized by a relatively wide intake range [e.g., see recent results on dietary fat and breast cancer from the NIH-American Association of Retired Persons Diet and Health Study (9)]. Moreover, the ability of the signal to penetrate the noise may be relatively strong for some dietary exposures and rather weak for others. We are now at a point where we can incorporate, relatively inexpensively, individual-level alternatives to the FFQ in prospective cohort studies. Such alternatives, either available or under development, include automated multiple 24-h recalls self-administered via internet and food records or recalls completed at the beginning of a study and analyzed subsequently in nested case-control fashion. These methods will likely have their own limitations and need to be evaluated rigorously, ideally with biomarker-based calibration studies built into cohorts. Furthermore, recalls or records at the individual level can be combined with FFQs and biomarker data. It is possible that various combinations, FFQ plus recall, or FFQ plus recall plus serum nutrient levels, for example, will improve our means of minimizing exposure misclassification. It would be a mistake to let the FFQ abandonment debate deter us from moving these alternative assessment tools into epidemiologic studies. Regardless of the final answers, we will want to know whether the diet-cancer results based on use of these alternatives look very much like those derived from the FFQ or differ substantially. Our common aim should be to move the field forward to obtain better estimates of dietary intakes and nutritional status and thus clarify their relationships with disease.
Footnotes
↵5 M.L. Neuhouser, personal communication.