349
Views
8
CrossRef citations to date
0
Altmetric
Methods in Addiction Research

The clinical consequences of variable selection in multiple regression models: a case study of the Norwegian Opioid Maintenance Treatment program

ORCID Icon, &
Pages 13-21 | Received 23 Jan 2019, Accepted 19 Jul 2019, Published online: 11 Oct 2019
 

ABSTRACT

Background: Selecting which variables to include in multiple regression models is a pervasive problem in medical research.

Objectives: Based on questionnaire data (n = 18538, 69.9% men) from the Norwegian Opioid Maintenance Treatment Program, this study aims to compare the performance of different variable selection methods and the potential clinical consequences of choice of method. The effect of missing data is also explored.

Methods: The dependent variable was engagement in criminal behavior while in treatment. Twenty-nine potential covariates on demographics, psychosocial factors and drug use were tested for inclusion in a multiple logistic regression model. Both complete case and multiply imputed data were considered. We compared the results from variable selection methods ranging from expert-based and purposeful variable selection, through stepwise methods, to more recently developed penalized regression using the Least Absolute Shrinkage and Selection Operator (LASSO).

Results: The various variable selection methods resulted in regression models including from 9 to 22 covariates. The stepwise selection procedures generated the models with the most covariates included. The choice of variable selection method directly affected the estimated regression coefficients, both in effect size and statistical significance. For several variables the expert-based approach disagreed with all data-driven methods.

Conclusions: The choice of variable selection method may strongly affect the resulting regression model, along with accompanying effect sizes and confidence intervals. This may affect clinical conclusions. The process should consequently be given sufficient consideration in model building. We recommend combining expert knowledge with a data-driven variable selection method to explore the models’ robustness.

Disclosures statement

The authors report no relevant financial conflicts

Additional information

Funding

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.