Expectation Maximization Algorithm

by allenlu2007

ML estimation and MAP estimation 乍看之下是兩種不同類型的 estimation.  ML estimation 是 classical estimation without prior information.   MAP 是 Bayesian estimation with prior information. 

雖然從 MAP estimation 可以推出 ML estimation (assuming unknow has a delta distrubtion or uniform distribution) and vice versa.  但畢竟是理論上的推導,並沒有實用上的用途。

考慮一個情況,一開始是 ML estimation, estimate theta.  For example, theta 的 mean and variance.  再來就得到一個 theta 的 distribution.  再來 iterative 得到更好的結果。

可以有兩種不同的方法,一是 time sequence add one more sample at a time for iteration.  二是 total sample number fix, but iterate many times.

簡單來說,雖然是 MLE parameter estimation, iteration 中卻變成 MAP during iteration.

Several problems with EM

1. It may converge to a local maximum

2. It’s a over fit system, it may cause problem by doing to many iteration!  for ML system.

—————–以上的解釋有誤 ——————-

以上的解釋雖不是 EM algorithm 的本意。

但在我看到 Daphne Knoll 的 probability graph model (PGM) lecture 後很 surprise 發現 belief propagation (loopy BP PGM) 有類似的 idea.  特別在 turbo code decoder 是利用 decoder1 的 bit 作為 dcoder2 的 prior information. 把 ML 變為 MAP and vice versa.  不同點是用兩個 encoders and decoders to iteratively decoding, instead of self looping!!!

可否 self-looping from ML to MAP, 可能要再 check.  



NewImage

 However, in some cases it is difficult to compute above equation.

 

When is difficult to compute?

 

1. multi-nomial?

 

2. hidden variable or latent variable?


Key is iterative!  It is possible to use iterative approach.

Some examples:

1. Binomial

2. communication detection

 

 

重點是引入 x,  

(1) Y=T(X) is a deterministic function of X.  E.g.   Y = (X1+X2)/2 or Y = sqrt(X1^2+X2^2)

(2) X is a complete and correct data 

NewImage

 NewImage

NewImage

 NewImage 

 

Step1:  Pick an initial guess \hat_{\theta}^{m=0} for \theta .

Step2: 假設 theta 是對的,given observed y, calculate the conditional pdf p(x|y,theta)

Step3: choose the new \theta maximize the expected log p(x|theta) or Q function.  We have to maximize the epxetected log p(x|theta) because we don’t really know x (it may be a hidden variable or latent variable).

 

 

 

  

 

 

 

 

Iterative way:

1. EM

2. ?? Kalman filter??

 

 

 Example 

EX1:  Normal distribution with known variance sig^2.

y = (y1, y2, .. ym)  missing data = (ym+1, … yn)

x = (y1, y2, … yn)

log Lc(phi) = log gc(x, phi)

Q(phi, phi_hat) = Ephi(k) log (Lc|y)

= E phi(k) ( sig(y1+..+ym)-phi(k)/sig^2) + ..)

Expectation part: 

E phi(t) ( sig m+1 to n |x) = (n-m) u(t)

s1(t) = sum(xi) + (n-m)u(t)

Maximization part:

u(t+1) = s1(t)/n

u(t+1) = [sum(xi to m) + (n-m)u(t)]/n

with large iteration –> u(t+1)=u(t) = u

u = sum(xi to m) / m !!!   The hidden information is useless since i.i.d.




 

 

Applications:

Missing data

Gaussian clustering

Hidden markov model (HMM)

 

Gaussian clustering,  hidden markov chain (same idea?)

eye diagram estimation?

DC offset estimation with data?

clock recovery with data (CDR) bang-bang?

CDR with missing pulse?

Advertisements