Different Classification Based on Square Loss, L1 loss, Hinge Loss, and Entropy Loss

by allenlu2007

 

Classification 可以用不同的方式來做。

可以統一從 loss function 來解釋。

事實上不論 regression (continuous output) 或是 classification (discrete output) 都可以用 loss function 來解釋。所以 loss function 是比較 general learning 的觀念。

所以 regression 和 classification 本質上的差異是什麼?

就我而言,classification 多引入了一個 decision boundary 的觀念。在 testing/predicting 時,只要看是在那一個 decision boundary 可以決定是那一個 class.  這在數位通訊也是非常 critical 的觀念 (symbol detection).    

此處都是假設所有的 data sample 都是 independent generated.  所以 decision boundary 可以適用。如果不同 data 之間有 correlation, 仍然可以用 decision boundary (?) 只是要 expand 到更高維空間? 例如通訊中的 convolution code or trellis code.  此處忽略不考慮。

如何決定 decision boundary? 

傳統通訊的 decision boundary 是 a priori.  因為我們事先知道所有傳送的 symbol. 

當然一般還是會有 training phase, 先送一些已知的 structured data (training symbol) 來校正 receiver 的 decision boundary; 或是 equalization (比較像 regression).  以後有機會用 machine learning 在通訊上?  (DDR 似乎開始用 data training boundary).  一般要求的 error rate ~ 10^-3 – 10^-12, 可以說要求非常準確。

在 machine learning 中,no surprise, decision boundary 是由一堆 training data 所 train/learn 出來。最佳的 boundary 常常是任意 shape.  我們只是用簡單的 function 來逼近 decision boundary.  一般沒有 structured data 來校正 decision boundary,  多少都有一些主觀 depending on picked loss function, picked feature space, etc.  一般要求的 error rate ~ 幾 % to tens of %.  可以容忍一些 error.

 

 

先假設沒有 outliner 的 case.

Square loss classification:  all samples has equal importance –> far away sample may shift decision boundary significantly –> bad

NewImage

 

 

 

 

L1 loss classification: same as above –> but less penalty for far away sample to shift decision boundary, yes?  It seems to be better than square loss!

 

Hinge loss (SVM):  only supporting vector (near to the decision boundary) has impact on decision boundary!!

 

Entropy Loss (Logistic / softmax): only points close to boundary has largest impact on the decision boundary!!

 

如果有 outliner 的 case.

1. square loss –> penalty is square, very bad

2. L1 loss –> penalty is linear, still bad

4. Entropy loss –> log (exp), similar to L1 loss, still bad

3. hinge loss –> sill to L1 loss, still bad

 

==> square loss is extremely bad;  The rest seems to be similar (L1).

 

 

Advertisements