564
Views
13
CrossRef citations to date
0
Altmetric
Research Articles

Mining big data to support decision making in healthcare

Pages 141-154 | Published online: 01 Nov 2016
 

Abstract

This study demonstrates an application of data mining in the analysis of big healthcare data. Considering the significant impact of obesity on the ever-rising healthcare costs in the United States, this study identifies key demographic and lifestyle characteristics associated with adult obesity. The sample for this study was drawn from the Behavioral Risk Factor Surveillance System data base of the Centers for Disease Control and Prevention. Using SAS Enterprise Miner, two predictive models are built to create a profile of an adult population group who are at risk of being obese. The models provide support for early intervention strategies and policymaking decisions for healthcare administrators and professionals.

Acknowledgments

The author acknowledges the contribution of Dr. Margil Funtanilla to this study. The contribution of Dr. Prasad Padmanabhan towards improving the manuscript considerably is also appreciated.

Funding

The author acknowledges the financial assistance from the Greehey School of Business at St. Mary’s University in San Antonio, Texas.

Notes

1 This study only examines adult obesity and not child obesity. Most of past research has examined childhood obesity in detail; however, adult obesity has not received so much attention.

2 No inference is made about causality between the variable of interest and the set of explanatory variables selected.

3 Persons with a BMI of 25 to 29.9 are considered overweight, whereas individuals with a BMI of 30 or more are considered obese.

4 Sociodemographic factors varied across clusters.

5 However, the generalizability of these results is suspect since it only introduced a limited number of input variables for analysis.

6 Obviously, these factors could be inter-related, particularly in adolescents. Many complex set of factors could collude to produce obesity.

7 Only one adult per household is interviewed. The BRFSS data are directly weighted for the probability of selection of a telephone number, the number of adults in a household, and the number of telephones in a household. A final post-stratification adjustment is made for non-response and non-coverage of households without telephones. The weights for each relevant factor are multiplied together to get a final weight. For more details on BRFSS procedures, see http://www.cdc.gov/brfss/questionnaires/index.htm (accessed on 24 April 2016).

8 The default statistics to impute missing values for interval and categorical variables are the mean and the mode of the non-missing values for the variables, respectively. When imputed, a new variable (prefaced with IMP_) is created for each variable for which missing values are imputed. Several missing values in the data were imputed before creating the models.

9 Continuous, ordinal, and binary variables can be used as target, and both continuous and discrete variables can be used as input. The node supports the stepwise, forward, and backward selection methods.

10 Given that the dataset is large with most variables being either class or categorical, means and medians on these variables are meaningless. Researchers using big data sets generally do not report descriptive statistics of input variables.

11 Generally, whereas conventional statistical methods develop statistical significance using small samples, big data samples are massive and represent the majority of (if not the entire) population. Consequently, the notion of statistical significance is less relevant to big data. Furthermore, in terms of computational efficiency, many conventional methods applicable to small samples do not scale up to big data (Gandomi & Haider, Citation2015).

12 Model accuracy refers to the accuracy with which a model classifies future data.

Additional information

Funding

The author acknowledges the financial assistance from the Greehey School of Business at St. Mary’s University in San Antonio, Texas.

Notes on contributors

Ajaya K. Swain

Ajaya K. Swain is an Assistant Professor of Quantitative Management in the Finance and Quantitative Management Department in the Greehey School of Business at St. Mary’s University. He received his master’s degree in Industrial Management Systems Engineering from University of Nebraska-Lincoln, and an MBA in Business Statistics and PhD in Operations Management from Texas Tech University. He has been involved in both teaching and developing technology and data analytics related courses. His research interests lie in predictive and social media analytics, operations and supply chain, and corporate sustainability. His research has been published in European Journal of Operational Research, Journal of Manufacturing Processes, IEEE Computer Society Journal, and Journal of Information Technology Case and Application Research among others.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 205.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.