Speech Recognition

Hardware that recognize speech signals from finite vocabulary has a few uses today e.g. automatic dialing from a cell phone

Abstract
Hardware that recognize speech signals from finite vocabulary has a few uses today e.g. automatic dialing from a cell phone, recognizing the name of extension holder in calls routing system and it also can be used for some different needs in robotic systems. The article of S.E Levison, L.R. Robiner, M.M Sondhi “Speaker Independent Isolated Digit Recognition Using Hidden Markov Models” Suggests a way for implementing such system using LPC-linear Predictor Coefficients, vector quantization and HMM- Hidden Markov Model . During our work we have implemented the speech recognition System that is suggested in the article using Matlab program, in addition we have implemented VAD -Voice Activity detection system that isolate the speech signal from the recorded signal. We have made some changes to the parameters that the article suggests and tested the recognition percentage following those changes. As a result we got an improvement in the system’s performance. The system got about 98% of recognition.

Figure 1 – Block diagram of the whole system

The problem
We would like to have a system that can recognize a finite vocabulary of speech signals. Such systems are already in use today in cell phones,calls routing system and sophisticated robotic systems. That system must be able to be trained to recognize those speech signal easily and should be implemented efficiently. Of course we would like to get a high recognition percent when we use that system. Given such system we would like to test its performance, after changing few parameters that it uses.

The basic approach
The article of S.E Levison, L.R. Robiner, M.M Sondhi “Speaker Independent Isolated Digit Recognition Using Hidden Markov Models” suggests a way for implementing a speech recognition system,that able to distinguish between digits. The system is sepeareted for two sub systems: training system and recognition system. The training system gets all the traininng digits signal seperates each signal into frames each frames is replaced in its linear predictor coeeficients. vector quantization is executed on those vectors of coeeficients.now each signal is actually replaced with sequence of indices representing the quantization. In HMM – Hidden markov Model We are modeling the diction of each digit into transfer between finite states acording to the air position. After the quantization for each digit we find a model of transfering between the states which is the most probabilistic suitable. In the recognition system each signal that we want to recognize is also seperated into frames which also replaced by their linear predictor coeeficients. The quantization is executed by the quantizer that was found in the training system.Now in order to recognize the digit we find each of the models we found in the training system is most probabilistic suitable to the sequence we got. We also added the VAD – voice Activity Detector to the system that is described above.

Figure 2 – Block diagram of the training system

Figure 3 – Block diagram of the recognition system

Figure 4 – Block diagram of the whole system with the voice activity detection

Tools
We have recorded the digits into standart PC (Pentium 4 – 2 Giga Htz) soundcard (Creative) – using matlab script we have written, The system was also implemented and tested on Matlab.

Conclusions
We found that the system that the article suggests is practical,even though the training demands a lot of training signals and therefore time, the recognition percantage we got after maximizing the performance is very high – about 98%.

Acknowledgments
We are grateful to our project supervisor Evgeny Gershikov for his effort, his help and guidance throughout this work, and for his willingness to help us at any time. We are also grateful to the Ollendorf Minerva Center for its support.

Figure 1 – Block diagram of the whole system

Figure 2 – Block diagram of the training system Figure 3 – Block diagram of the recognition system Figure 4 – Block diagram of the whole system with the voice activity detection

Figure 2 – Block diagram of the training system

Figure 3 – Block diagram of the recognition system

Figure 4 – Block diagram of the whole system with the voice activity detection