This is the continuation of the post before. This discussion on this post is: The variation of accuracy and correctness based on the number of observation mixtures. How does adding noise affect our recognition accuracy (The experiment done in our last post involves clean test input) The confusion matrix of our test results […]Read more "Force Alignment using HMM 2"
Disclaimer: Most of the contents are from HTKBook, this is just a summary based on my own Chinese Digit Recognition Dataset Development Platform OS: Linux 4.9.27-1 Tools: Wavesurfer, HTK, Python2.7 Data Set Segregation The data set that we will be using is provided by NCTUDS-100 DATABASE. The file format will be stored as follow: md010101.pcm […]Read more "Force Alignment using Hidden Markov Model"
Introduction Dynamic Time Warping is an algorithm used to match two speech sequence that are same but might differ in terms of length of certain part of speech (phones for example). Here, we’ll not be using phone as a basic unit but frames that are obtained from MFCC features that are obtained from feature extraction […]Read more "Dynamic Time Warping for Speech Recognition"
Introduction GMM vs K-Means First, we’ll have to understand what are hard decisions and soft decisions . Hard Decision A data point is clustered to a single cluster and the results are final. Soft Decision A data point is modeled by a distribution of clusters, thus it will be probabilistically defined and there’s no definite […]Read more "GMM-Based Speaker Recognition"
As the title, there’re several ways on extracting important information from speech signals. We’ll dive into all of them. All speech signals will be pre-emphasized by a pre-emphasis filter of As we know, the whole process of LPC coefficient extraction can be divided into the following stages: source: https://www.mathworks.com/help/dsp/examples/lpc-analysis-and-synthesis-of-speech.html First, we would like to find […]Read more "LPC & Cepstrum & MFCC"