Abstract
The naive Bayes classifier is known to obtain good results with a simple procedure. The method is based on the independence of the attribute variables given the variable to be classified. In real databases, where this hypothesis is not verified, this classifier continues to give good results. In order to improve the accuracy of the method, various works have been carried out in an attempt to reconstruct the set of the attributes and to join them so that there is independence between the new sets although the elements within each set are dependent. These methods are included in the ones known as semi-naive Bayes classifiers. In this article, we present an application of uncertainty measures on closed and convex sets of probability distributions, also called credal sets, in classification. We represent the information obtained from a database by a set of probability intervals (a credal set) via the imprecise Dirichlet model and we use uncertainty measures on credal sets in order to reconstruct the set of attributes, such as those mentioned, which shall enable us to improve the result of the naive Bayes classifier in a satisfactory way.
Acknowledgements
This work has been supported by the Spanish Ministry of Science and Technology under the Algra project (TIN2004-06204-C03-02).
Notes
We consider the concept of “information based on uncertainty” (Klir Citation2006) relating to information deficiency (incomplete, vague, fuzzy, contradictory, deficient, etc.) that can appear from different types of uncertainty. We shall always refer to the term “information” in the context of reduction of uncertainty, unlike its use in logic or in computability theory.
Considering an uncertainty measurement U we can express the measurement of information associated as − U.
J48 method can be obtained via weka software, available in http://www.cs.waikato.ac.nz/ml/weka/