150
Views
0
CrossRef citations to date
0
Altmetric
ARTICLES

On the formation and use of calibration equations in nutritional epidemiology – Discussion of the Paper by R. L. Prentice and Y. Huang

&
Pages 11-13 | Received 29 May 2018, Accepted 23 Jun 2018, Published online: 19 Jul 2018

Prentice and Huang (Citation2018) provide an interesting review of the current and future challenges facing the field of nutritional epidemiology. We fully concur with their stress on the importance of statistical concepts and methods in meeting these challenges. We welcome the opportunity here to expand in two directions on the approaches suggested in their review: first, to show that in implementing their approach extra statistical steps may be required; and second to highlight some dangers of misuse of the estimator, , of true dietary intake that is central to their approach.

In their development, the authors postulate the availability in their cohort study of the following information: a self-report assessment of the dietary intake of interest, q(t); personal characteristics, v(t), relevant to the self-reported dietary intake and the disease outcome, and in a sub-sample biomarker measurements, w(t), that measure the dietary intake in an unbiased manner with independent errors. From these elements, they construct a variable that measures x(t) with Berkson error, and that may consequently be used to relate dietary intake to disease outcome in a Cox regression model.

The question we raise is ‘what is the nature of the biomarker measurements w(t)?’. One may consider three classes of such biomarkers.

The first class contains the recovery biomarkers, biological products that derive from a relatively simple metabolic process that leads to a direct and very close correspondence to the dietary intake. There are only a few such biomarkers available and these measure energy, protein, potassium, and sodium intakes, but their use in Prentice and Huang’s approach is straightforward.

The second class contains the biomarkers (e.g., metabolites) that are developed from human feeding studies, in which volunteers are fed specified diets over a period of several weeks, and markers, denoted by vector m, measured in biological samples from the volunteers, are then determined. These markers are then related to the known intakes of specific foods or nutrients in an equation x = f(m) + e, where x is the known intake of the food or nutrient of interest, f is some mathematical function of the vector m and e is error independent of f(m). The new problem that now arises is that f(m) does not have the desirable statistical property enjoyed by Prentice and Huang’s w(t), namely that w(t) = x(t)+ e(t), where e(t) is independent of x(t). Therefore, a further statistical step is required to find a function of m that indeed has this property. If x and f(m) were joint normally distributed, this step would be achieved by the simple device of inverting the regression to be of the form , where e* is error independent of x, and then using as the biomarker measure for x. In practice, one may need to work with mathematically transformed values of x to approximate the joint normality assumption. See Tasevska et al. (Citation2011) for a similar approach to creating a biomarker measure with the desired statistical properties.

A third class of biomarkers is the creation of a biomarker signature for a specific dietary pattern. A metabolite profile is one such example of this type of biomarker. Biomarkers of dietary patterns offer unique statistical challenges (Carroll, Citation2014) and very little methodologic work has yet been done in this area. Guasch-Ferré, Bhupathiraju, and Hu (Citation2018) review efforts to develop such signatures from metabolomics profile data for vegetarian, Western, Mediterranean, and Nordic dietary patterns among others. Prentice and Huang rightfully acknowledge both the promise of such approaches and the need for innovations in statistical methodology to better handle the complex measurement properties of these data. The challenges of intra- and inter-person variability of these markers (Ala-Korpela, Citation2018; Guasch-Ferré et al., Citation2018; Johnson & Gonzalez, Citation2012), the many host and environmental factors, including age, genetic factors, disease, drugs, diet, lifestyle, and environment factors (Guasch-Ferré et al., Citation2018; Johnson & Gonzalez, Citation2012) that contribute to such variability, and the technological details of the assay, such as compartment (e.g., urine or plasma), assay platform, sample processing, and data pre-preprocessing steps (e.g., normalization) (Ala-Korpela, Citation2018; Guasch-Ferré et al., Citation2018; Johnson & Gonzalez, Citation2012) all need to be considered. It is perhaps not surprising that, to date, efforts to create signatures have suffered from lack of reproducibility (Ala-Korpela, Citation2018). Statistical innovation, and transparent reporting of all steps of the sample processing and data analysis, following such standards as TRIPOD (Moons et al., Citation2015), will likely be needed for these biomarkers to succeed as reliable and reproducible measures of dietary intake.

Prentice and Huang’s idea of constructing a biomarker calibrated estimate of x(t), the variable they denote by is very attractive operationally, because it may be used in Cox regression analyses to unbiasedly estimate the association of dietary intake with disease outcome. It is also conceptually attractive, because it may be thought of as an estimate of an individual’s ‘usual’ intake of the food or nutrient of interest. However, this conceptual attractiveness is partially illusory and can lure the unwary into any of several traps. We mention a few such traps here. As mentioned earlier, measures x(t) with Berkson error. Because Berkson error in explanatory variables does not cause bias in the estimation of the coefficients of a regression model, epidemiologists and statisticians alike have come to think of this type of error as ‘harmless’. Consequently, it has become quite common to encounter investigators treating as if it were really x(t) in contexts outside of the regression problem for which was constructed.

A case in point is Prentice and Huang’s example of simply categorising and entering the dummy variables for the categories as explanatory variables in a Cox regression. They show in a simulation that this yields biased estimates of the relative risks between categories of x(t). This has also been reported by Keogh, Strawbridge, and White (Citation2012), who observe that, in the simplest case of linear regression calibration and no covariates, is a simple linear function of q(t), so that the ordered categories of and q(t) are identical, leading to identical biased estimates of relative risk between the categories.

To give another example, has been used to estimate the population distribution of x(t) without adjustment for its Berkson error. Distributions thus calculated severely underestimate the spread of the distribution, and thus lead to biased estimation of the percentiles, especially in the tails.

In other examples, has been used as the outcome variable in a new regression model. Imagine that one has constructed a calibration equation for usual protein intake, and one now wishes to know which personal characteristics are associated with high protein intake. Regressing on personal characteristics yields biased estimates of the regression coefficients that would be obtained from a regression of x(t) on those personal characteristics, the bias being caused by the Berkson error in (Hyslop & Imbens, Citation2001).

It is, therefore, important that statisticians involved in the construction of such calibration or prediction equations provide clear advice to other researchers on the proper use of such equations and adequate warning regarding their misuse. Prentice and Huang have described here a powerful approach to improving the methodology of nutritional epidemiology studies, for which they should be thanked. However, as with all powerful approaches, we need to ensure that it is used appropriately.

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Notes on contributors

Laurence S. Freedman

Laurence S. Freedman, a professor of biostatistics, has worked for more than 45 years on applying statistics to the study of public health and medicine. His principal areas of application have been nutrition and chronic disease and clinical trials methodology. Recently, he has headed the Validation Studies Pooling Project and the Task Group on Measurement Error and Misclassification of the STRATOS initiative.

Pamela A. Shaw

Pamela A. Shaw is an Associate Professor of Biostatistics at the University of Pennsylvania. Her statistical research interests include clinical trial design and methodology to address covariate and outcome measurement error. She has a particular interest in infectious disease, behavioral intervention studies, and the use of biomarker studies to calibrate self-reported nutritional intake and physical activity.

References

  • Ala-Korpela, M. (2018). Objective metabolomics research. Clinical Chemistry, 64(1), 30–33. doi: 10.1373/clinchem.2017.274852
  • Carroll, R. J. (2014). Estimating the distribution of dietary consumption patterns. Statistical Science, 29(1), 2–8. doi: 10.1214/12-STS413
  • Guasch-Ferré, M., Bhupathiraju, S. N., & Hu, F. B. (2018). Use of metabolomics in improving assessment of dietary intake. Clinical Chemistry, 64(1), 82–98. doi: 10.1373/clinchem.2017.272344
  • Hyslop, D., & Imbens, G. (2001). Bias from classical and other forms of measurement error. Journal of Business & Economic Statistics, 19, 475–481. doi: 10.1198/07350010152596727
  • Johnson, C. H., & Gonzalez, F. J. (2012). Challenges and opportunities of metabolomics. Journal of Cellular Physiology, 227(8), 2975–2981. doi: 10.1002/jcp.24002
  • Keogh, R. H., Strawbridge, A. D., & White, I. (2012). Correcting for bias due to misclassification when error-prone continuous exposures are misclassified. Epidemiologic Methods, 1(1), Article 9.
  • Moons, K. G. M., Altman, D. G., Reitsma, J. B., Ioannidis, J. P. A., Macaskill, P., Steyerberg, E. W., … Collins, G. S. (2015). Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): explanation and elaboration. Annals of Internal Medicine, 162(1), W1–73. doi: 10.7326/M14-0698
  • Prentice, R. L., & Huang, Y. (2018). Nutritional epidemiology methods and related statistical challenges and opportunities. Statistical Theory and Related Fields. Advance online publication. doi: 10.1080/24754269.2018.1466098
  • Tasevska, N., Midthune, D., Potischman, N., Subar, A. F., Cross, A. J., Bingham, S. A., … Kipnis, V. (2011). Use of the predictive sugars biomarker to evaluate self-reported total sugars intake in the observing protein and energy nutrition (OPEN) study. Cancer Epidemiology Biomarkers and Prevention, 20(3), 490–500. doi: 10.1158/1055-9965.EPI-10-0820

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.