Expectation Maximization Algorithm

by allenlu2007

ML estimation and MAP estimation 乍看之下是兩種不同類型的 estimation.  ML estimation 是 classical estimation without prior information.   MAP 是 Bayesian estimation with prior information. 

雖然從 MAP estimation 可以推出 ML estimation (assuming unknow has a delta distrubtion or uniform distribution) and vice versa.  但畢竟是理論上的推導,並沒有實用上的用途。

考慮一個情況,一開始是 ML estimation, estimate theta.  For example, theta 的 mean and variance.  再來就得到一個 theta 的 distribution.  再來 iterative 得到更好的結果。

可以有兩種不同的方法,一是 time sequence add one more sample at a time for iteration.  二是 total sample number fix, but iterate many times.

簡單來說,雖然是 MLE parameter estimation, iteration 中卻變成 MAP during iteration.

Several problems with EM

1. It may converge to a local maximum

2. It’s a over fit system, it may cause problem by doing to many iteration!  for ML system.

—————–以上的解釋有誤 ——————-

以上的解釋雖不是 EM algorithm 的本意。

但在我看到 Daphne Knoll 的 probability graph model (PGM) lecture 後很 surprise 發現 belief propagation (loopy BP PGM) 有類似的 idea.  特別在 turbo code decoder 是利用 decoder1 的 bit 作為 dcoder2 的 prior information. 把 ML 變為 MAP and vice versa.  不同點是用兩個 encoders and decoders to iteratively decoding, instead of self looping!!!

可否 self-looping from ML to MAP, 可能要再 check.  


 However, in some cases it is difficult to compute above equation.


When is difficult to compute?


1. multi-nomial?


2. hidden variable or latent variable?

Key is iterative!  It is possible to use iterative approach.

Some examples:

1. Binomial

2. communication detection



重點是引入 x,  

(1) Y=T(X) is a deterministic function of X.  E.g.   Y = (X1+X2)/2 or Y = sqrt(X1^2+X2^2)

(2) X is a complete and correct data 






Step1:  Pick an initial guess \hat_{\theta}^{m=0} for \theta .

Step2: 假設 theta 是對的,given observed y, calculate the conditional pdf p(x|y,theta)

Step3: choose the new \theta maximize the expected log p(x|theta) or Q function.  We have to maximize the epxetected log p(x|theta) because we don’t really know x (it may be a hidden variable or latent variable).









Iterative way:

1. EM

2. ?? Kalman filter??




EX1:  Normal distribution with known variance sig^2.

y = (y1, y2, .. ym)  missing data = (ym+1, … yn)

x = (y1, y2, … yn)

log Lc(phi) = log gc(x, phi)

Q(phi, phi_hat) = Ephi(k) log (Lc|y)

= E phi(k) ( sig(y1+..+ym)-phi(k)/sig^2) + ..)

Expectation part: 

E phi(t) ( sig m+1 to n |x) = (n-m) u(t)

s1(t) = sum(xi) + (n-m)u(t)

Maximization part:

u(t+1) = s1(t)/n

u(t+1) = [sum(xi to m) + (n-m)u(t)]/n

with large iteration –> u(t+1)=u(t) = u

u = sum(xi to m) / m !!!   The hidden information is useless since i.i.d.




Missing data

Gaussian clustering

Hidden markov model (HMM)


Gaussian clustering,  hidden markov chain (same idea?)

eye diagram estimation?

DC offset estimation with data?

clock recovery with data (CDR) bang-bang?

CDR with missing pulse?