Publication Cover
Journal of Quality Technology
A Quarterly Journal of Methods, Applications and Related Topics
Volume 48, 2016 - Issue 3
100
Views
11
CrossRef citations to date
0
Altmetric
Case Studies

A Study of Missing Data Imputation in Predictive Modeling of a Wood-Composite Manufacturing Process

, , , , &
 

Abstract

Problem: Real-time process data and destructive test data were collected and merged from a wood-composite manufacturer in the southeastern US for the purpose of developing real-time predictive models for strength properties of manufactured particleboard. Sensor malfunction and other real-time data problems lead to null fields in the company's data warehouse, resulting in information loss. Many manufacturers attempt to build accurate predictive models excluding entire records with null fields or use summary statistics such as the average or median in place of the null field. However, predictive-model errors in validation may be higher in the presence of information loss and may misguide the production process.

Approach: This paper summarizes an application of missing-data imputation methods in predictive modeling of a wood-composite manufacturing process. Variable selection was applied prior to imputing missing data. Two missing data-imputation methods were selected after comparing six possible methods. Predictive models of imputed data were developed using partial least squares regression (PLSR) and were compared with models developed from nonimputed data.

Results: Maximum–likelihood-based imputation using the expectation-maximization (EM) algorithm and multiple imputation (MI) using Markov Chain Monte Carlo (MCMC) simulation achieved lower root mean-square error of prediction results than imputation based on the mean/median substitution, last-observation-carried-forward (LOCF), or a “hot-deck” method using single imputation. Predictive models based on the imputed dataset generated more precise prediction results than models based on nonimputed datasets. Outcomes of the study included avoiding rework and scrap when model predictions alerted of an imminent strength failure and minor reductions were made in resin input set points. Senior management of the company indicated that a savings occurred as a result of the study from lower resin usage (the second highest cost component of manufactured product).

Additional information

Notes on contributors

Yan Zeng

Mr. Zeng is a Senior Risk Manager at Lending Club. His email address is [email protected].

Timothy M. Young

Dr. Young is a Professor in the Center for Renewable Carbon. His email address is [email protected].

David J. Edwards

Dr. Edwards is an Associate Professor in the Department of Statistical Sciences & Operations Research. His email address is [email protected]. He is the corresponding author.

Frank M. Guess

Dr. Guess is a Professor in the Department of Business Analytics & Statistics. His email address is [email protected].

Chung-Hao Chen

Dr. Chen is an Assistant Professor in the Department of Electrical & Computer Engineering. His email address is [email protected].

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.