66
Views
4
CrossRef citations to date
0
Altmetric
Original Articles

Accounting for Data-Dependent Degrees of Freedom Selection When Testing the Effect of a Continuous Covariate in Generalized Additive Models

, &
Pages 1115-1135 | Received 25 Jan 2008, Accepted 03 Feb 2009, Published online: 17 Mar 2009
 

Abstract

Often in generalized additive models (GAMs), the amount of smoothing is chosen to optimize some data-dependent criterion (e.g., AIC). Through simulations, we estimated the type I error of the GAM-based tests of (i) no association and (ii) linearity while using this approach. Overall, type I error rates were much higher than nominal levels. We proposed new critical values, which resulted in correct Overall type I error. We also compared power to detect nonlinearities of GAMs with several df-selection strategies with conventional parametric models. To illustrate our approach, we re-analyzed the association between body mass index and coronary heart disease mortality.

Mathematics Subject Classification:

Acknowledgments

This work was supported by grants from the Canadian Institutes for Health Research (CIHR #6391) and from the National Sciences and Engineering Research Council of Canada (NSERC). Dr. Michal Abrahamowicz is a James McGill Professor. Dr. Goldberg gratefully acknowledges support from the CIHR. This work was part of Dr. Andrea Benedetti's doctoral thesis in the department of Epidemiology and Biostatistics at McGill University. The authors thank Dr. Karen Leffondré for helpful discussions.

Notes

i N = sample size.

iiPercent of all samples for which that degree of freedom was chosen by the AIC selection procedure.

iiiType I error is estimated as the proportion of samples for which that degree of freedom was chosen where H 0 was rejected at α = 0.05.

ivCorrected critical value representing the 95th percentile of the empirical distribution of the test statistic, when the independent variable was generated from a lognormal distribution.

v“Generic” corrected critical value representing the largest of three 95th percentiles of the empirical distribution of the test statistic where the independent variable was generated from a uniform, normal or lognormal distribution respectively.

viThe overall type I error rate, estimated as the total number of samples with p < 0.05 for the test conditional on AIC-optimal df divided by 10,000, and corresponding exact 95% confidence interval.

viiThe mean of 10,000 AIC-optimal df.

viiiThe estimated variance of the 10,000 AIC-optimal df.

i N = sample size.

iiPercent of all samples for which that degree of freedom was chosen by the AIC selection procedure.

iiiType I error is estimated as the proportion of samples for which that degree of freedom was chosen where H 0 was rejected at α = 0.05.

ivCorrected critical value representing the 95th percentile of the empirical distribution of the test statistic, when the independent variable was generated from a lognormal distribution.

v“Generic” corrected critical value representing the largest of three 95th percentiles of the empirical distribution of the test statistic where the independent variable was generated from a uniform, normal or lognormal distribution, respectively.

viThe overall type I error rate, estimated as the total number of samples with p < 0.05 for the test conditional on AIC-optimal df divided by 10,000, and corresponding exact 95% confidence interval.

viiThe mean of the 10,000 AIC-optimal df.

viiiThe estimated variance of the 10,000 AIC-optimal df.

iOverall type I error rate is calculated as the average of the degree of freedom specific type I error rates, weighted by the proportion chosen for each degree of freedom.

iiDegree of freedom was chosen as the model with the lowest AIC from nine models fitted with k = 2 − 10 df when testing against a linear H 0, or from among ten models fitted with k = 1 − 10 df when testing against a null H 0.

iiiDegree of freedom was chosen as the model with the lowest AIC from three models fitted with k = 2, 4, or 8 df when testing against a linear H 0, or from among four models fitted with k = 1, 2, 4, or 8 df when testing against a null H 0.

iCorrected critical value representing the 95th percentile of the empirical distribution of the test statistic.

ii‘–’ indicates that the nominal value was used as there was little evidence that the empirical type I error diverged from 0.05.

iSee Tables and for the 95th percentile of the empirical distribution of the test statistics corresponding to different dfs that were used as “corrected” critical values.

iiTrue association between X and the logit of the outcome.

iiiDistribution of the independent variable.

iWhen choosing the model with the lowest AIC from models with 1, 2, 4, or 8 df; iiDegree of freedom chosen a priori as 4; iiiProportion of samples in which the linear model was statistically significant; ivDegree of freedom that was chosen most frequently, and % of samples in which it was chosen; vMean degree of freedom that was chosen across all generated samples; viGenerated data sets were of size 300; and viiGenerated data sets were of size 1,000.

iWhen choosing the model with the lowest AIC from models with 1, 2, 4, or 8 df; iiDegree of freedom chosen a priori as 4; iiiA quadratic model (i.e., the sum of a linear and a quadratic term) was fit to the data. The model was deemed significant if the quadratic term was statistically significant; ivGenerated datasets were of size 300; and vGenerated datasets were of size 1,000.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.