ABSTRACT
Classification systems play an important role in medical decision support, because they allow automatizing and accelerating the data analysis process. However, their quality is based on that of the training dataset upon which the classification models are built. The labeling process of each training example is usually performed by domain experts or automatic systems. When a wrong assignment of class labels to examples is performed, the training process and, therefore, the classification performance, might be negatively affected. This problem is formally known as class label noise. One of the most used techniques to reduce the harmful consequences of mislabeled objects is noise filtering, which removes noisy examples from the training data. This article analyzes the usefulness of such methods in the context of medical data classification. The experiments carried out on several real-world datasets show the importance of noise filtering when class noise affects the data.
Funding
José A. Sáez was supported by EC under FP7, Coordination and Support Action, Grant Agreement Number 316097, ENGINE European Research Centre of Network Intelligence for Innovation Enhancement (http://engine.pwr.wroc.pl).Bartosz Krawczyk and Michał Woźniak were supported by the Polish National Science Centre under the grant no. DEC-2013/09/B/ST6/02264.