### Baum-Welch (EM) Algorithm for HMM

#### by allenlu2007

前文提到 EM algorithm for Kalman filter training.

Step 0: Initial guess

A, C, Q, R, pi (initial mean), v (initial covariance)

Step 1: Kalman filter forward pass

Step 2: Kalman smoothing (Rauch) backward pass

Step 3: Update A, C, Q, R to maximize the Q function!

Step 4: Repeat step1 till converge.

Key is step 3: 先 compute joint pdf p(x1, .. , xn, y1, …, yn)

Q function derivative in terms of joint pdf!

在 HMM 中，有類似的 EM algorithm, 即是 Baum-Welch algorithm. 但更簡化

(1) A and Q becomes matrix A (transition probability matrix)

(2) C and R becomes matrix B (emission probability matrix)

(3) Initial probability theta vector

其實就是把 Gaussian pdf (mean and covariance) 換成 probability matrix/vector!!!!!!

Step 0: Initial guess of A, B, pi

Step 1: Forward pass using HMM filtering

Step 2: Backward pass using HMM smoothing

Step 3: EM algorithm to maximize Q function

Step 4: repeat step 1 and check converge

Details:

### Step 0: Initial guess of A, B, pi

A: NxN (N states)

B: KxN (K possible output values)

Pi: Nx1 vector

### Step 1: Forward pass using HMM filtering

### Step 2: Backward pass using HMM smoothing

### Step 3: Update using EM algorithm to maximize Q function

詳細推導過程見最後。

### Step 4: repeat step 1 and check converge

It is possible to over-fit a particular data set. That is . The algorithm also does **not**guarantee a global maximum.

## Baum-Welch Algorithm 推導

Z is state {1..M}, the latent variables; X is observations {1.. N}

Baum-Welch is an EM algorithm

Q-function

Then use the Lagrange multiplier to find the maximum