### Gaussian Mixture Model (GMM) and EM

#### by allenlu2007

什麼是 GMM?

The pdf of K Gaussian mixture 如下。常用在各種的 signal processing 如影像或聲音的處理上。工程中常用的 jitter analysis 也可用此分析。

# 簡單例子 (K=3, 1D and 2D GMM)

# 如何估計 GMM parameters?

要找到 GMM 中的 weight, mean, and variance 等 unknowns 一般是用 Expectation Maximization Iteration.

EM 的優點: introduce hidden variables (wi) to simplify the maximization of the likelihood.

EM 的缺點: local minimum if initial values not correct. solution: K-means first or other way

(ii) K is hard to define. Solution: from 1 …?

# How EM work in GMM?

E – Estimation step: Estimate the distribution of the hidden variable given the data and the current value of the parameters

M – Maximization step: Maximize the joint distribution of the data and hidden variable

Assume the hidden variable is Z;

注意如果 Z 可以觀察到. Weight, mean, and variance can be estimated Gaussian by Gaussian (每個都是單一的Gaussian).

可惜 Z 是 hidden variable. 我們引入 indicator variable Z:

Therefore, the complete (joint) likelihood function Lc of all X and Z as follows:

所以重點是如何計算 r(i, j) = P( j | xi, theta_t), responsibility function? 其實非常簡單:

Soft assignment: r(i, j) = P( j | xi, theta_t) = wj * P(xi , j | theta) / P(xi | theta) for j = 1, .. , K

Hard assignment: P( j | xi, theta_t) = 1 for maximum wj * P(xi, j | theta) for j = 1,.., K

以下再 summarize EM for clustering algorithm. 注意並不限制為 Gaussian, 可以應用在其他的 PDF 如 Poisson, Binomial, etc. 只要會單一 PDF (Gaussian, Poisson, Binomial) 的 ML estimation, 就可以推廣到 clustering estimation.

幾個重要的 issues

Initialization: EM is very sensitive to initial conditions: garbage in –> garbage out

Solution: 通常用 K-Means to get a good initialization (K-Means is hard decision; EM is soft decision, so to speak)

Number of Gaussians:Use information-theoretic criteria to obtain optima K???

Simplification of covariance matrix!! (mainly for video or higher dimension applications)

# Two Gaussians 的簡化版

With the EM (and K Means) method, 我們先指定每一個 sample si 屬於那一個 Gaussian. 指定後即可算出 ML estimation 的 parameters (mean, variance, and weight). 根據新算出的參數，再重新指定每一個 sample si 屬於那一個 Gaussian.

有兩種方法來指定 si: 一種是 100% 屬於 A Gaussian or B Gaussian (非A即B). 這是 hard assignment, K Means 的方法 (true or false?)。

另一種是部份指定，r(i,1) * 100% 屬於 A Gaussian, r(i, 2) * 100% 屬於 B Gaussian. r(i, 1) + r(i, 2) = 1. 這是 soft assignment, EM 的方法。 r(i, j) 被稱為 responsibility function. 以下的討論為 soft assignment.

The mixed PDF is:

where

Quantum mechanics interpretation

Hard/soft –> firm assignment with error vector? for responsibility?

K-means for initial value?