Maximum Likelihood Estimation 最大似然估計

by allenlu2007

Maximum likelihood estimation 翻譯成最大似然估計是根據中文 wiki ,我比較傾向翻成最大可能估計。後文就簡稱 MLE. MLE 可以說是 parameter estimation 中最重要且核心的觀念。Fisher 在 1912-1922 年大力推廣並做了有關的分析。


What is Estimation?

考慮一個 pdf 其中有一個或多個參數 \theta 需要估計,例如 Gaussian distribution 的 mean and variance, Possion distribution 的 arrival rate。要如何從觀察到的 random date g = (x1, x2, .. xm) 來估計 \theta?

我們先定義 likelihood function L:


  • \theta 可以是一個 fixed value, 或是一個 random variable (數位通信送0 or 1), 或是 continuous probability distribution.
  • An estimate of the parameter \theta is named \hat{\theta}. \hat{\theta} 一般是 function of g. 因為 g 是 random vector, \hat{\theta} 可視為一個 random variable.
  • An estimation \hat{\theta} compared with true value \theta 包含兩種 errors
    • Random error (precision: estimation variance, noise, jitter, ..)
    • Systematic error (accuracy: estimation bias, calibration error, wrong model, ..)

\hat{\theta} 是一個 random variable condition on \theta.



image  It’s called bias condition on \theta

\hat{\theta} is an unbiasd estimate for which  b(\theta)=0  for all \theta.  大多數情況下,我們希望找 unbiased estimator.  這很直覺,有例外嗎?

另外可以定義 average bias


and ensemble mean-square error


ML Estimation 定義

找 Likelihood function 的 maximum


相當於 log-likelihood function 的 maximum


如何更進一步找到 ML estimator?

Score 定義

score 是 Log-likelihood function 的一階導數,也是 Likelihood function 的 sensitivity function.


ML estimator  \hat{\theta} 就是 s(g) = 0


Why ML?

An ML estimate is:

  • Efficient if an efficient estimate exists
  • Asypmptotically efficient (as you get more or better data)
  • Asypmptotically unbiased
  • Asypmptotically consistent
  • Usually easy to compute
  • A way of rigorously enforcing agreement with the data
  • A way of doing estimation with no prior information


ML estimation is:
– A way of rigorously enforcing agreement with the data
– A way of doing estimation with no prior information

Data are noisy. Rigorous agreement with noisy data will give noisy estimates
even though that is the best you can do without bias!
You always have some prior information and you should use it, even
though it might introduce bias.
One way to use a prior pr() is with the weighted likelihood:

WL argmax

pr(gj) pr() = argmax

pr(jg) :
This estimate is also called the maximum a posteriori or MAP estimate,




    * Barrett MLE 基本介紹