Missing Data Analysis in Regression

C. G. Marcelinoa Institute of Computing, Federal University of Rio de Janeiro (UFRJ), Rio de Janeiro, BrazilCorrespondence[email protected]

https://orcid.org/0000-0002-7595-8227

G. M. C. Leiteb Systems Engineering and Computer Science Program, Federal University of Rio de Janeiro (UFRJ), Rio de Janeiro, Brazil

P. Celesb Systems Engineering and Computer Science Program, Federal University of Rio de Janeiro (UFRJ), Rio de Janeiro, Brazil

C. E. Pedreirab Systems Engineering and Computer Science Program, Federal University of Rio de Janeiro (UFRJ), Rio de Janeiro, Brazil

ABSTRACT

Many of the datasets in real-world applications contain incompleteness. In this paper, we approach the effects and possible solutions to incomplete databases in regression, aiming to bridge a gap between theoretically effective algorithms. We investigated the actual effects of missing data for regression by analyzing its impact in several publicly available databases implementing popular algorithms like Decision Tree, Random Forests, Adaboost, K-Nearest Neighbors, Support Vector Machines, and Neural Networks. Our goal is to offer a systematic view of how missing data may affect regression results. After exhaustive simulation analyzing eight public datasets from UCI and KEEL (Abalone, Arfoil, Bike, California, Compactiv, Mortage, Wankara and Wine), we concluded that the effect of missing data may be significant. The results obtained showed that K-Nearest Neighbors works better than others in the regression of data that has missing data.

Acknowledgments

The authors would like to acknowledge financial support from the following Brazilian research agencies: CAPES, CNPq and FAPERJ. We would like to say thank you to Susan Hussey ([email protected]) for the native English review.

Disclosure Statement

No potential conflict of interest was reported by the author(s).

Missing Data Analysis in Regression

Information for

Open access

Opportunities

Help and information

Missing Data Analysis in Regression

ABSTRACT

Acknowledgments

Disclosure Statement

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature