Compression measures used in inductive learners, such as measures based on the minimum description length principle, can be used as a basis for grading candidate hypotheses. Compression-based induction is suited also for handling noisy data. This paper shows that a simple compression measure can be used to detect noisy training examples, where noise is due to random classification errors. A technique is proposed in which noisy examples are detected and eliminated from the training set, and a hypothesis is then built from the set of remaining examples. This noise elimination method was applied to preprocess data for four machine-learning algorithms, and evaluated on selected medical domains.
Free access
Noise detection and elimination in data preprocessing: Experiments in medical domains
Reprints and Corporate Permissions
Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?
To request a reprint or corporate permissions for this article, please click on the relevant link below:
Academic Permissions
Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?
Obtain permissions instantly via Rightslink by clicking on the button below:
If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.
Related research
People also read lists articles that other readers of this article have read.
Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.
Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.