Gaussian Mixture Model (GMM) and EM

by allenlu2007

 

什麼是 GMM?

The pdf of K Gaussian mixture 如下。常用在各種的 signal processing 如影像或聲音的處理上。工程中常用的 jitter analysis 也可用此分析。

NewImage

NewImage

簡單例子 (K=3, 1D and 2D GMM)

NewImage

如何估計 GMM parameters?

Given a set of observed data X = {x1, x2, .., xn}, estimate the weight, mean, and variance.

NewImage


 

要找到 GMM 中的 weight, mean, and variance 等 unknowns 一般是用 Expectation Maximization Iteration.  

EM 的優點: introduce hidden variables (wi) to simplify the maximization of the likelihood. 

EM 的缺點: local minimum if initial values not correct.  solution:  K-means first or other way 

(ii) K is hard to define.  Solution:  from 1 …?


How EM work in GMM?

E – Estimation step: Estimate the distribution of the hidden variable given the data and the current value of the parameters

M – Maximization step: Maximize the joint distribution of the data and hidden variable


Assume the hidden variable is Z; 

NewImage

NewImage

NewImage

 

注意如果 Z 可以觀察到.  Weight, mean, and variance can be estimated Gaussian by Gaussian (每個都是單一的Gaussian).

可惜 Z 是 hidden variable.  我們引入 indicator variable Z:

NewImage

 

Therefore, the complete (joint) likelihood function Lc of all X and Z as follows:

NewImage

NewImage

NewImage

NewImage
NewImage

NewImage


NewImage

NewImage

 

所以重點是如何計算 r(i, j) = P( j | xi, theta_t), responsibility function?  其實非常簡單:

Soft assignment:  r(i, j) = P( j | xi, theta_t) = wj * P(xi , j | theta) /  P(xi | theta)  for j = 1, .. , K

Hard assignment:   P( j | xi, theta_t) = 1  for maximum wj * P(xi, j | theta) for j = 1,.., K


以下再 summarize EM for clustering algorithm.  注意並不限制為 Gaussian, 可以應用在其他的 PDF 如 Poisson, Binomial, etc.  只要會單一 PDF (Gaussian, Poisson, Binomial) 的 ML estimation, 就可以推廣到 clustering estimation.

NewImage

 

幾個重要的 issues

Initialization:  EM is very sensitive to initial conditions:   garbage in –> garbage out

Solution: 通常用 K-Means to get a good initialization  (K-Means is hard decision; EM is soft decision, so to speak)


Number of Gaussians:Use information-theoretic criteria to obtain optima K???

NewImage 

Simplification of covariance matrix!!  (mainly for video or higher dimension applications)


 

 

Two Gaussians 的簡化版

With the EM (and K Means) method, 我們先指定每一個 sample si 屬於那一個 Gaussian.  指定後即可算出 ML estimation 的 parameters (mean, variance, and weight).   根據新算出的參數,再重新指定每一個 sample si 屬於那一個 Gaussian.

有兩種方法來指定 si: 一種是 100% 屬於 A Gaussian or B Gaussian (非A即B).  這是 hard assignment, K Means 的方法 (true or false?)。

另一種是部份指定,r(i,1) * 100% 屬於 A Gaussian,  r(i, 2) * 100% 屬於 B Gaussian.  r(i, 1) + r(i, 2) = 1.  這是 soft assignment, EM 的方法。 r(i, j) 被稱為 responsibility function.   以下的討論為 soft assignment.

NewImage

The mixed PDF is: 

NewImage

where 

{\pi }_{1}+{\pi }_{2} = 1 



Quantum mechanics interpretation

Hard/soft –> firm assignment with error vector? for responsibility?

K-means for initial value?



 





 

 

Advertisements