Abstract
This paper introduces bootstrap error estimation for automatic tuning of parameters in combined networks, applied as front-end preprocessors for a speech recognition system based on hidden Markov models. The method is evaluated on a large-vocabulary (10 000 words) continuous speech recognition task. Bootstrap estimates of minimum mean squared error allow selection of speaker normalization models improving recognition performance. The procedure allows a flexible strategy for dealing with inter-speaker variability without requiring an additional validation set. Recognition results are compared for linear, generalized radial basis functions and multi-layer perceptron network architectures.