ABSTRACT
Background: Many exposures in epidemiological studies have nonlinear effects and the problem is to choose an appropriate functional relationship between such exposures and the outcome. One common approach is to investigate several parametric transformations of the covariate of interest, and to select a posteriori the function that fits the data the best. However, such approach may result in an inflated Type I error. Methods: Through a simulation study, we generated data from Cox's models with different transformations of a single continuous covariate. We investigated the Type I error rate and the power of the likelihood ratio test (LRT) corresponding to three different procedures that considered the same set of parametric dose-response functions. The first unconditional approach did not involve any model selection, while the second conditional approach was based on a posteriori selection of the parametric function. The proposed third approach was similar to the second except that it used a corrected critical value for the LRT to ensure a correct Type I error. Results: The Type I error rate of the second approach was two times higher than the nominal size. For simple monotone dose-response, the corrected test had similar power as the unconditional approach, while for non monotone, dose-response, it had a higher power. A real-life application that focused on the effect of body mass index on the risk of coronary heart disease death, illustrated the advantage of the proposed approach. Conclusion: Our results confirm that a posteriori selecting the functional form of the dose-response induces a Type I error inflation. The corrected procedure, which can be applied in a wide range of situations, may provide a good trade-off between Type I error and power.
Acknowledgments
This research was supported in part by a grant from the Natural Sciences and Engineering Research Council (NSERC) of Canada awarded to Dr. Michal Abrahamowicz. Mamun Mahmud was supported by the McGill Major Fellowship offered by McGill University of Canada. Dr. Michal Abrahamowicz is a James McGill Professor at McGill University. Dr. Karen Leffondré is the recipient of a salary award from the Fonds de la recherche en santé du Québec (FRSQ). Dr. Y. P. Chaubey is partly supported by NSERC.
Notes
aThe overall rate for the unconditional approach is pooled across 70,000 LRT statistics (7 models for each of the 10,000 samples), while for the conditional approach it is pooled across 10,000 LRT's based on sample-specific best-fitting models.
bCI, confidence interval.
cProportion of samples where the candidate model was selected. Accordingly, for both last columns, the sum of the seven rows (each corresponding to a specific candidate model) equals one.
aThe overall rate for the unconditional approach is pooled across 70,000 LRT statistics (7 models for each of the 10,000 samples), while for the conditional approach it is pooled across 10,000 LRT based on sample-specific best-fitting models.
bCI, confidence interval.
cProportion of samples where the candidate model was selected. Accordingly, for each the scenario (corresponding to a given sample size n and to a given censoring rate c.r.), the sum of the seven rows (each corresponding to a specific candidate model) equals one.
aAIC, Akaike Information Criterion.
bLRT, Likelihood ratio test statistic for testing the null hypothesis of no association between BMI and the risk of CHD death.
cCorrected critical value corresponding to n = 300 and 70% censoring.