Abstract
In this paper, a multiple level fusion framework to apply into the automatic speaker recognition system in order to improve its performance is presented. Based on the framework, different multiple level fusion methods, such as a strong multiple level fusion and three weak multiple level fusions, are defined in this paper. To examine the effectiveness of the proposed framework, two-feature combination scheme would be considered. After investigating the availability of strong and weak multiple level fusions for this scheme, the framework adopts a weak multiple level fusion method which combines two level fusions, ie matching-score fusion and decisionmaking fusion. In the matching-score level, a commonly used method called the score vector fusion is adopted. In the decision-making level, the kernel combination, also known as Multiple Kernel Learning is chosen. These two techniques can be embedded into many automatic speaker recognition systems. Throughout the evaluation by NIST 2001 corpus, two sets of experiments were conducted that the results of the two-feature combination scheme by the multiple level fusions are better than the traditional matching-score level fusion and unimodal methods. It is demonstrated that the multiple level fusion framework is an effective method to fuse the features for speaker recognition applications.