Detailed explanation of GMM-HMM speech recognition principle

This article briefly describes the principles, modeling and testing process of GMM-HMM in speech recognition.

1. What is Hidden Markov Model?

Three problems to be solved by HMM:

1) Likelihood

2) Decoding

3) Training

2. What is GMM? How to use GMM to find the probability of a phoneme?

3. GMM + HMM Dafa solves speech recognition

3.1 Identification

3.2 Training

3.2.1 Training the params of GMM

3.2.2 Training the params of HMM

================================================== ==================

1. What is Hidden Markov Model?

Detailed explanation of GMM-HMM speech recognition principle

ANS: A Markov process with unobservable and visible nodes (see detailed explanation).

Hidden nodes represent the state, and visible nodes represent the speech we hear or the timing signals we see.

At the beginning, we specify the structure of this HMM. When training the HMM model: given n time series signals y1 ... yT (training samples), use MLE (typically implemented in EM) to estimate the parameters:

1. The initial probability of N states

2. State transition probability a

3. Output probability b

--------------

In speech processing, a word is composed of several phoneme (phonemes);

Each HMM corresponds to a word or phoneme

A word is expressed as several states, and each state is expressed as a phoneme

There are three problems to be solved with HMM:

1) Likelihood: the probability of an HMM generating a string of observaTIon sequence x <the Forward algorithm>

Detailed explanation of GMM-HMM speech recognition principle

Among them, αt (sj) means that HMM is in state j at time t, and observeaTIon = {x1 ,. . ., Xt} Probability

Detailed explanation of GMM-HMM speech recognition principle ,

aij is the transition probability from state i to state j,

bj (xt) represents the probability of generating xt in state j,

2) Decoding: Given a string of observaTIon sequence x, find the most likely subordinate HMM state sequence <the Viterbi algorithm>

Pruning will be done in the actual calculation, instead of calculating the probability of each possible state sequence, but using Viterbi approximaTIon:

From time 1: t, only the state and probability with the highest transition probability are recorded.

Let Vt (si) be the maximum probability that the state is j at all times from t-1 to t:

Detailed explanation of GMM-HMM speech recognition principle

Remember Detailed explanation of GMM-HMM speech recognition principle Is: from which state at time t-1 to state t at time t has the highest probability;

The Viterbi approximation process is as follows:

Detailed explanation of GMM-HMM speech recognition principle

Then according to the most likely transition state sequence recorded Backtracking:

Detailed explanation of GMM-HMM speech recognition principle

3) Training: Given an observation sequence x, train the HMM parameter λ = {aij, bij} the EM (Forward-Backward) algorithm

This part we put in "3. GMM + HMM Dafa to solve speech recognition" and talk with GMM training

-------------------------------------------------- -------------------

Connector 2.00mm Pitch

Connector 2.00Mm Pitch,Ph Connector Accessories,Ph Connectors Accessories,Strip Wire Connectors

YUEQING WEIMAI ELECTRONICS CO.,LTD , https://www.wmconnector.com

Posted on