Expectation Maximization Algorithm

ML estimation and MAP estimation 乍看之下是兩種不同類型的 estimation.  ML estimation 是 classical estimation without prior information.   MAP 是 Bayesian estimation with prior information.

Several problems with EM

1. It may converge to a local maximum

2. It’s a over fit system, it may cause problem by doing to many iteration!  for ML system.

—————–以上的解釋有誤 ——————-

However, in some cases it is difficult to compute above equation.

When is difficult to compute?

1. multi-nomial?

2. hidden variable or latent variable?

Key is iterative!  It is possible to use iterative approach.

Some examples:

1. Binomial

2. communication detection

(1) Y=T(X) is a deterministic function of X.  E.g.   Y = (X1+X2)/2 or Y = sqrt(X1^2+X2^2)

(2) X is a complete and correct data

Step1:  Pick an initial guess $\hat_{\theta}^{m=0}$ for $\theta$.

Step2: 假設 theta 是對的，given observed y, calculate the conditional pdf p(x|y,theta)

Step3: choose the new \theta maximize the expected log p(x|theta) or Q function.  We have to maximize the epxetected log p(x|theta) because we don’t really know x (it may be a hidden variable or latent variable).

Iterative way:

1. EM

2. ?? Kalman filter??

Example

EX1:  Normal distribution with known variance sig^2.

y = (y1, y2, .. ym)  missing data = (ym+1, … yn)

x = (y1, y2, … yn)

log Lc(phi) = log gc(x, phi)

Q(phi, phi_hat) = Ephi(k) log (Lc|y)

= E phi(k) ( sig(y1+..+ym)-phi(k)/sig^2) + ..)

Expectation part:

E phi(t) ( sig m+1 to n |x) = (n-m) u(t)

s1(t) = sum(xi) + (n-m)u(t)

Maximization part:

u(t+1) = s1(t)/n

u(t+1) = [sum(xi to m) + (n-m)u(t)]/n

with large iteration –> u(t+1)=u(t) = u

u = sum(xi to m) / m !!!   The hidden information is useless since i.i.d.

Applications:

Missing data

Gaussian clustering

Hidden markov model (HMM)

Gaussian clustering,  hidden markov chain (same idea?)

eye diagram estimation?

DC offset estimation with data?

clock recovery with data (CDR) bang-bang?

CDR with missing pulse?