564
Views
13
CrossRef citations to date
0
Altmetric
Research Articles

Mining big data to support decision making in healthcare

 

Abstract

This study demonstrates an application of data mining in the analysis of big healthcare data. Considering the significant impact of obesity on the ever-rising healthcare costs in the United States, this study identifies key demographic and lifestyle characteristics associated with adult obesity. The sample for this study was drawn from the Behavioral Risk Factor Surveillance System data base of the Centers for Disease Control and Prevention. Using SAS Enterprise Miner, two predictive models are built to create a profile of an adult population group who are at risk of being obese. The models provide support for early intervention strategies and policymaking decisions for healthcare administrators and professionals.

Acknowledgments

The author acknowledges the contribution of Dr. Margil Funtanilla to this study. The contribution of Dr. Prasad Padmanabhan towards improving the manuscript considerably is also appreciated.

Funding

The author acknowledges the financial assistance from the Greehey School of Business at St. Mary’s University in San Antonio, Texas.

Notes

1 This study only examines adult obesity and not child obesity. Most of past research has examined childhood obesity in detail; however, adult obesity has not received so much attention.

2 No inference is made about causality between the variable of interest and the set of explanatory variables selected.

3 Persons with a BMI of 25 to 29.9 are considered overweight, whereas individuals with a BMI of 30 or more are considered obese.

4 Sociodemographic factors varied across clusters.

5 However, the generalizability of these results is suspect since it only introduced a limited number of input variables for analysis.

6 Obviously, these factors could be inter-related, particularly in adolescents. Many complex set of factors could collude to produce obesity.

7 Only one adult per household is interviewed. The BRFSS data are directly weighted for the probability of selection of a telephone number, the number of adults in a household, and the number of telephones in a household. A final post-stratification adjustment is made for non-response and non-coverage of households without telephones. The weights for each relevant factor are multiplied together to get a final weight. For more details on BRFSS procedures, see http://www.cdc.gov/brfss/questionnaires/index.htm (accessed on 24 April 2016).

8 The default statistics to impute missing values for interval and categorical variables are the mean and the mode of the non-missing values for the variables, respectively. When imputed, a new variable (prefaced with IMP_) is created for each variable for which missing values are imputed. Several missing values in the data were imputed before creating the models.

9 Continuous, ordinal, and binary variables can be used as target, and both continuous and discrete variables can be used as input. The node supports the stepwise, forward, and backward selection methods.

10 Given that the dataset is large with most variables being either class or categorical, means and medians on these variables are meaningless. Researchers using big data sets generally do not report descriptive statistics of input variables.

11 Generally, whereas conventional statistical methods develop statistical significance using small samples, big data samples are massive and represent the majority of (if not the entire) population. Consequently, the notion of statistical significance is less relevant to big data. Furthermore, in terms of computational efficiency, many conventional methods applicable to small samples do not scale up to big data (Gandomi & Haider, Citation2015).

12 Model accuracy refers to the accuracy with which a model classifies future data.

Additional information

Funding

The author acknowledges the financial assistance from the Greehey School of Business at St. Mary’s University in San Antonio, Texas.

Notes on contributors

Ajaya K. Swain

Ajaya K. Swain is an Assistant Professor of Quantitative Management in the Finance and Quantitative Management Department in the Greehey School of Business at St. Mary’s University. He received his master’s degree in Industrial Management Systems Engineering from University of Nebraska-Lincoln, and an MBA in Business Statistics and PhD in Operations Management from Texas Tech University. He has been involved in both teaching and developing technology and data analytics related courses. His research interests lie in predictive and social media analytics, operations and supply chain, and corporate sustainability. His research has been published in European Journal of Operational Research, Journal of Manufacturing Processes, IEEE Computer Society Journal, and Journal of Information Technology Case and Application Research among others.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.