Mining big data to support decision making in healthcare: Journal of Information Technology Case and Application Research: Vol 18 , No 3

Abstract

This study demonstrates an application of data mining in the analysis of big healthcare data. Considering the significant impact of obesity on the ever-rising healthcare costs in the United States, this study identifies key demographic and lifestyle characteristics associated with adult obesity. The sample for this study was drawn from the Behavioral Risk Factor Surveillance System data base of the Centers for Disease Control and Prevention. Using SAS Enterprise Miner, two predictive models are built to create a profile of an adult population group who are at risk of being obese. The models provide support for early intervention strategies and policymaking decisions for healthcare administrators and professionals.

Acknowledgments

The author acknowledges the contribution of Dr. Margil Funtanilla to this study. The contribution of Dr. Prasad Padmanabhan towards improving the manuscript considerably is also appreciated.

Funding

The author acknowledges the financial assistance from the Greehey School of Business at St. Mary’s University in San Antonio, Texas.

Notes

¹ This study only examines adult obesity and not child obesity. Most of past research has examined childhood obesity in detail; however, adult obesity has not received so much attention.

² No inference is made about causality between the variable of interest and the set of explanatory variables selected.

³ Persons with a BMI of 25 to 29.9 are considered overweight, whereas individuals with a BMI of 30 or more are considered obese.

⁴ Sociodemographic factors varied across clusters.

⁵ However, the generalizability of these results is suspect since it only introduced a limited number of input variables for analysis.

⁶ Obviously, these factors could be inter-related, particularly in adolescents. Many complex set of factors could collude to produce obesity.

⁷ Only one adult per household is interviewed. The BRFSS data are directly weighted for the probability of selection of a telephone number, the number of adults in a household, and the number of telephones in a household. A final post-stratification adjustment is made for non-response and non-coverage of households without telephones. The weights for each relevant factor are multiplied together to get a final weight. For more details on BRFSS procedures, see http://www.cdc.gov/brfss/questionnaires/index.htm (accessed on 24 April 2016).

⁸ The default statistics to impute missing values for interval and categorical variables are the mean and the mode of the non-missing values for the variables, respectively. When imputed, a new variable (prefaced with IMP_) is created for each variable for which missing values are imputed. Several missing values in the data were imputed before creating the models.

⁹ Continuous, ordinal, and binary variables can be used as target, and both continuous and discrete variables can be used as input. The node supports the stepwise, forward, and backward selection methods.

¹⁰ Given that the dataset is large with most variables being either class or categorical, means and medians on these variables are meaningless. Researchers using big data sets generally do not report descriptive statistics of input variables.

¹¹ Generally, whereas conventional statistical methods develop statistical significance using small samples, big data samples are massive and represent the majority of (if not the entire) population. Consequently, the notion of statistical significance is less relevant to big data. Furthermore, in terms of computational efficiency, many conventional methods applicable to small samples do not scale up to big data (Gandomi & Haider, Citation2015).

¹² Model accuracy refers to the accuracy with which a model classifies future data.

Additional information

Funding

The author acknowledges the financial assistance from the Greehey School of Business at St. Mary’s University in San Antonio, Texas.

Notes on contributors

Ajaya K. Swain

Ajaya K. Swain is an Assistant Professor of Quantitative Management in the Finance and Quantitative Management Department in the Greehey School of Business at St. Mary’s University. He received his master’s degree in Industrial Management Systems Engineering from University of Nebraska-Lincoln, and an MBA in Business Statistics and PhD in Operations Management from Texas Tech University. He has been involved in both teaching and developing technology and data analytics related courses. His research interests lie in predictive and social media analytics, operations and supply chain, and corporate sustainability. His research has been published in European Journal of Operational Research, Journal of Manufacturing Processes, IEEE Computer Society Journal, and Journal of Information Technology Case and Application Research among others.

Log in via your institution

Access through your institution

Log in to Taylor & Francis Online

Shibboleth

Log in to Taylor & Francis Online

Restore content access

Restore content access for purchases made as guest

Purchase options * Save for later

PDF download + Online access

48 hours access to article PDF & online version
Article PDF can be downloaded
Article PDF can be printed

USD 53.00 Add to cart

Issue Purchase

30 days online access to complete issue
Article PDFs can be downloaded
Article PDFs can be printed

USD 205.00 Add to cart

* Local tax will be added as applicable

Mining big data to support decision making in healthcare

Notes on contributors

Ajaya K. Swain

Log in via your institution

Log in to Taylor & Francis Online

Restore content access

Related Research

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Mining big data to support decision making in healthcare

Abstract

Acknowledgments

Funding

Notes

Additional information

Funding

Notes on contributors

Ajaya K. Swain

Log in via your institution

Log in to Taylor & Francis Online

Log in to Taylor & Francis Online

Restore content access

Related Research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date