人臉偵測 Face Detection 算法

by allenlu2007

雖然 deep learning 被視為 computer vision 的萬靈丹,但 deep learning 需要的 computation resource for training and testing 非常大。另外需要收集 big data for learning.

有時並不容易用在小系統的 real-time application (with relatively low computation power) 或 low power wearable/portable device. 

對於一些特徵明顯的 image or video detection/recognition problem 如人臉偵測, 特定交通號誌識別 (速限), 指紋辨識。特定的算法可以得到準確的結果,同時也比較容易 implement, 有其價值。

Limitation:  當局部的特徵不明顯或被遮避, e.g. 正面照才可以, 如果 wear sunglass, feature base face detection 可能就不如 deep learning.

本文主要參考 Viola’s paper 以及馬克周的 article.

http://mark-jo-prog.blogspot.tw/2012/08/face-detection-opencv-242.html

一般人臉相關應用如人臉識別,駕駛疲勞偵測的前處理可能會包含人臉偵測。一但偵測到人臉再做進一步的處理 (feature extraction and face recognition)。好處是可以大幅減少 computation resource. 


image
image

 

Face Detection Algorithm

Viola 在 [Robust Real-Time Face Detection]由 Paul Viola, MICHAEL J. JONES 著作,這篇為2004年在International Journal of Computer Vision會議。

 


以下是馬克周的說明

Viola paper 發表的論文利用了三個算法來做 face detection framework, 而非三種不同的 face detection 算法!!!

1. Integral Image; 2. Adaboost; 3. Cascade classifier.

Integral Image 
所謂的Integral Image的概念,如Figure 1所示,點(x, y)位置的值為左上角所有灰 
色方塊範圍內的pixel的值的加總。

image

Figure 1. Integral Image

image

Figure2. Rectangle Feature

有了Integral Image值的定義以後,對於一張Image我們會有小小的類似Figure 2的 
Feature,這些Feature我們稱之為rectangle feature,這些rectangle feature的大小不 
固定,但是白色的方塊一定和灰色的方塊一樣大。這些方塊會像Filter一樣,在 
一張Image之中移動。而對於每一個Feature值的計算,就是將白色的Integral的值 
減掉灰色方框Integral的值。對於這樣的方式,為何可以找到 Image 中想要的 Feature,可以利用 Simard et al等人的證明來解釋。這個部份可以直接參閱 paper 第四頁的部份。

 

Ada Boost 
-AdaBoost全名為Adaptive Boosting。 AdaBoost是一种迭代算法,其核心思想是針對同一個訓練集(training set)訓練不同的分類器(弱分類器),然後把這些弱分類器集合起來,構成一個更強的最终分類器(強分類器)。

Ada Boost結合Rectangle feature在人臉辨識的algorithm如Algorithm 1所示。首先會 
有一堆training的image,分別標示著人臉m張以及非人臉l張。對於每一張image 
我們一開始分別給他們 1/m或是 1/l的weight,端看它們是人臉或是非人臉。 
接著我們要從一大堆 rectangle feature 裡面取出 T 個 feature 來。因此對下面的步 
驟,我們會重複 T 次。 
1. 首先將所有的 weight normalize 成加起來為 1。 
2. 根據hypotheses function,選出一個error值最小的feature。關於這個 
function如Algorithm 2所示。 
3. 紀錄可以使 error 值最小的參數。 
4. 將weight根據Algorithm 1的公式update。 
最後我們求出來的 classifier 就是由 T 個 weak feature 所組成的。因此對於丟入的 
一張 image,就會由這 T 個 feature 來投票,每一個 feature 的投票的權重不太一 
樣。但只有當加權值大於一半以上的所有分數時,才會認可這一張 image為人臉。 


image

Algorithm 2. Ada Boost in face detection

 

Cascade Classifier 
對於Cascade classifier的概念,就如Figure 3所示。我們一開始將feature分成好幾 
個classifier。最前面的classier辨識率最低,但是可以先篩選掉很大一部份不是人 
臉的圖片;接下來的Classier處理比較難處理一點的case篩選掉的圖片也不如第一 
個classier多了;依此下去,直到最後一個classier為止。最後留下來的就會是我們 
想要的人臉的照片。

 

image

Figure 3. Cascade classifier

 

然而應該要決定多少個 classifier 呢?這個問題決定於我們所設定的 false positive 
rate 以及 detection rate 而定。所謂的 false positive rate 就是我們將人臉的圖片誤 
判成不是人臉的圖片的機率。而所謂的 detection rate 則正確找到人臉的準確率。 
通常這兩個之間會有 trade off,如果我們想要達到比較高的 detection rate,那麼 
false positive rate 可能就會比較高一點;而如果想達到比較低的 false positive 
rate,那麼正確率難免就會下降。 
整個選取cascade classifier的演算法如Algorithm 3所示。首先我們要決定每一個階 
層的classifier的false positive rate以及detection rate。然後我們要決定一個target的 
false positive rate以及target detection rate,當所有的整體的false positive rate以及 
detection rate達到設定的值以後才會停止。因此對於每一個階層,我們就要選足 
夠多的feature來達到false positive rate以及detection rate。

 

image

Algorithm 3. Cascade classifier in face detection

 

OpenCV: Haar Feature-based Cascade Classifier for Object Detection

檢測方法最初由Paul Viola [Viola01]提出,並由Rainer Lienhart [Lienhart02]對這一方法進行了改善. 首先,利用樣本(大約幾百幅樣本圖片)的 harr 特征進行分類器訓練,得到一個級聯的boosted分類器。訓練樣本分為正例樣本和反例樣本,其中正例樣本是指待檢目標樣本(例如人臉或汽車等),反例樣本指其它任意圖片,所有的樣本圖片都被歸一化為同樣的尺寸大小(例如,20×20)。

分類器訓練完以後,就可以應用於輸入圖像中的感興趣區域(與訓練樣本相同的尺寸)的檢測。檢測到目標區域(汽車或人臉)分類器輸出為1,否則輸出為0。為了檢測整副圖像,可以在圖像中移動搜索視窗,檢測每一個位置來確定可能的目標。 為了搜索不同大小的目標物體,分類器被設計為可以進行尺寸改變,這樣比改變待檢圖像的尺寸大小更為有效。所以,為了在圖像中檢測未知大小的目標物體,掃描程式通常需要用不同比例大小的搜索視窗對圖片進行幾次掃描。

分類器中的“級聯”(cascade)是指最終的分類器是由幾個簡單分類器級聯組成。在圖像檢測中,被檢視窗依次通過每一級分類器,這樣在前面幾層的檢測中大部分的候選區域就被排除了,全部通過每一級分類器檢測的區域即為目標區域,目前支持這種分類器的boosting技術有四種: Discrete Adaboost, Real Adaboost, Gentle Adaboost and Logitboost。”boosted” 即指級聯分類器的每一層都可以從中選取一個boosting演算法(權重投票),並利用基礎分類器的自我訓練得到。基礎分類器是至少有兩個葉結點的決策樹分類器。 Haar特征是基礎分類器的輸入,主要描述如下。目前的演算法主要利用下面的Harr特征:

每個特定分類器所使用的特征用形狀、感興趣區域中的位置以及比例繫數(這裡的比例繫數跟檢測時候採用的比例繫數是不一樣的,儘管最後會取兩個繫數的乘積值)來定義。例如在第二行特征(2c)的情況下,響應計算為覆蓋全部特征整個矩形框(包括兩個白色矩形框和一個黑色矩形框)象素的和減去黑色矩形框內象素和的三倍 。每個矩形框內的象素和都可以通過積分圖象很快的計算出來。(察看下麵和對cvIntegral的描述)。

Haar-like features

 

 

 

 

 

 

 

 

 

 

 

 

 

以下是 opencv 對 Viola paper 的說明更清楚。

opencv 以及 wiki 說明 face detection 算法包含 4 stages 如下。

The algorithm has four stages:

  1. Haar Feature Selection
  2. Creating an Integral Image
  3. Adaboost Training
  4. Cascading Classifiers

Object Detection using Haar feature-based cascade classifiers is an effective object detection method proposed by Paul Viola and Michael Jones in their paper, “Rapid Object Detection using a Boosted Cascade of Simple Features” in 2001. It is a machine learning based approach where a cascade function is trained from a lot of positive and negative images. It is then used to detect objects in other images.

Here we will work with face detection. Initially, the algorithm needs a lot of positive images (images of faces) and negative images (images without faces) to train the classifier. Then we need to extract features from it. For this, haar features shown in below image are used. They are just like our convolutional kernel. Each feature is a single value obtained by subtracting sum of pixels under white rectangle from sum of pixels under black rectangle.

haar_features.jpg

image

Now all possible sizes and locations of each kernel is used to calculate plenty of features. (Just imagine how much computation it needs? Even a 24×24 window results over 160000 features). For each feature calculation, we need to find sum of pixels under white and black rectangles. To solve this, they introduced the integral images. It simplifies calculation of sum of pixels, how large may be the number of pixels, to an operation involving just four pixels. Nice, isn’t it? It makes things super-fast.

But among all these features we calculated, most of them are irrelevant. For example, consider the image below. Top row shows two good features. The first feature selected seems to focus on the property that the region of the eyes is often darker than the region of the nose and cheeks. The second feature selected relies on the property that the eyes are darker than the bridge of the nose. But the same windows applying on cheeks or any other place is irrelevant. So how do we select the best features out of 160000+ features? It is achieved by Adaboost.

haar.png

image

For this, we apply each and every feature on all the training images. For each feature, it finds the best threshold which will classify the faces to positive and negative. But obviously, there will be errors or misclassifications. We select the features with minimum error rate, which means they are the features that best classifies the face and non-face images. (The process is not as simple as this. Each image is given an equal weight in the beginning. After each classification, weights of misclassified images are increased. Then again same process is done. New error rates are calculated. Also new weights. The process is continued until required accuracy or error rate is achieved or required number of features are found).

Final classifier is a weighted sum of these weak classifiers. It is called weak because it alone can’t classify the image, but together with others forms a strong classifier. The paper says even 200 features provide detection with 95% accuracy. Their final setup had around 6000 features. (Imagine a reduction from 160000+ features to 6000 features. That is a big gain).

So now you take an image. Take each 24×24 window. Apply 6000 features to it. Check if it is face or not. Wow.. Wow.. Isn’t it a little inefficient and time consuming? Yes, it is. Authors have a good solution for that.

In an image, most of the image region is non-face region. So it is a better idea to have a simple method to check if a window is not a face region. If it is not, discard it in a single shot. Don’t process it again. Instead focus on region where there can be a face. This way, we can find more time to check a possible face region.

For this they introduced the concept of Cascade of Classifiers. Instead of applying all the 6000 features on a window, group the features into different stages of classifiers and apply one-by-one. (Normally first few stages will contain very less number of features). If a window fails the first stage, discard it. We don’t consider remaining features on it. If it passes, apply the second stage of features and continue the process. The window which passes all stages is a face region. How is the plan !!!

Authors’ detector had 6000+ features with 38 stages with 1, 10, 25, 25 and 50 features in first five stages. (Two features in the above image is actually obtained as the best two features from Adaboost). According to authors, on an average, 10 features out of 6000+ are evaluated per sub-window.

So this is a simple intuitive explanation of how Viola-Jones face detection works. Read paper for more details or check out the references in Additional Resources section.

Haar-cascade Detection in OpenCV

OpenCV comes with a trainer as well as detector. If you want to train your own classifier for any object like car, planes etc. you can use OpenCV to create one. Its full details are given here: Cascade Classifier Training.

Here we will deal with detection. OpenCV already contains many pre-trained classifiers for face, eyes, smile etc. Those XML files are stored in opencv/data/haarcascades/ folder. Let’s create face and eye detector with OpenCV.

First we need to load the required XML classifiers. Then load our input image (or video) in grayscale mode.

另外可以參考 Wiki Viola-Jones object detection framework.

Advertisements