Speech recognition using neural networks

Páginas: 199 (49718 palabras) Publicado: 9 de mayo de 2010
Speech Recognition using Neural Networks
Joe Tebelskis
May 1995 CMU-CS-95-142

School of Computer Science Carnegie Mellon University Pittsburgh, Pennsylvania 15213-3890

Submitted in partial fulfillment of the requirements for a degree of Doctor of Philosophy in Computer Science

Thesis Committee: Alex Waibel, chair Raj Reddy Jaime Carbonell Richard Lippmann, MIT Lincoln Labs

Copyright©1995 Joe Tebelskis
This research was supported during separate phases by ATR Interpreting Telephony Research Laboratories, NEC Corporation, Siemens AG, the National Science Foundation, the Advanced Research Projects Administration, and the Department of Defense under Contract No. MDA904-92-C-5161. The views and conclusions contained in this document are those of the author and should not beinterpreted as representing the official policies, either expressed or implied, of ATR, NEC, Siemens, NSF, or the United States Government.

Keywords: Speech recognition, neural networks, hidden Markov models, hybrid systems, acoustic modeling, prediction, classification, probability estimation, discrimination, global optimization.



This thesis examines how artificial neuralnetworks can benefit a large vocabulary, speaker independent, continuous speech recognition system. Currently, most speech recognition systems are based on hidden Markov models (HMMs), a statistical framework that supports both acoustic and temporal modeling. Despite their state-of-the-art performance, HMMs make a number of suboptimal modeling assumptions that limit their potential effectiveness. Neuralnetworks avoid many of these assumptions, while they can also learn complex functions, generalize effectively, tolerate noise, and support parallelism. While neural networks can readily be applied to acoustic modeling, it is not yet clear how they can be used for temporal modeling. Therefore, we explore a class of systems called NN-HMM hybrids, in which neural networks perform acoustic modeling,and HMMs perform temporal modeling. We argue that a NN-HMM hybrid has several theoretical advantages over a pure HMM system, including better acoustic modeling accuracy, better context sensitivity, more natural discrimination, and a more economical use of parameters. These advantages are confirmed experimentally by a NN-HMM hybrid that we developed, based on context-independent phoneme models,that achieved 90.5% word accuracy on the Resource Management database, in contrast to only 86.0% accuracy achieved by a pure HMM under similar conditions. In the course of developing this system, we explored two different ways to use neural networks for acoustic modeling: prediction and classification. We found that predictive networks yield poor results because of a lack of discrimination, butclassification networks gave excellent results. We verified that, in accordance with theory, the output activations of a classification network form highly accurate estimates of the posterior probabilities P(class|input), and we showed how these can easily be converted to likelihoods P(input|class) for standard HMM recognition algorithms. Finally, this thesis reports how we optimized the accuracy of oursystem with many natural techniques, such as expanding the input window size, normalizing the inputs, increasing the number of hidden units, converting the network’s output activations to log likelihoods, optimizing the learning rate schedule by automatic search, backpropagating error from word level outputs, and using gender dependent networks.




I wish to thankAlex Waibel for the guidance, encouragement, and friendship that he managed to extend to me during our six years of collaboration over all those inconvenient oceans — and for his unflagging efforts to provide a world-class, international research environment, which made this thesis possible. Alex’s scientific integrity, humane idealism, good cheer, and great ambition have earned him my respect,...
Leer documento completo

Regístrate para leer el documento completo.

Estos documentos también te pueden resultar útiles

  • Speech recognition
  • Advantages and disadvantages of using social networks
  • Speech-Synthesis-And-Recognition
  • Sra
  • Neural networks for control
  • Optimization Of Laminated Composite Plates And Shells Using Genetic Algorithms, Neural Networks And Finite Elements
  • Networks
  • Speech

Conviértase en miembro formal de Buenas Tareas