Deep Learning Neural Network How to Avoid Local Minimum and Overfit
Deep learning uses multi-layer nonlinear neural network.
By default it’s highly nonlinear and has many local minimum compared with e.g. SVM classification (discriminative).
How to solve this problem?? In addition, in Deep Learning, the parameter space is huge, how to avoid the overfit problem? Other than just to regularization?
By Hinton and his students. It ues pre-train phase (similar to RBM, stacked RBm, or DBN) to fit the weight to input data.
It solves both local minimum problem and overfit problem!!!!
Generative model for pre-training and back-prop for fine tuning is the best performance.
Back-prop is only useful for finding the local minimum.
Hinton’s Lecture 14 is excellent to explain this!
Deep learning neural net 非常 sensitive to initial weight values!
之前的想法是用 shallow auto-encoder (based on RBM) 來 initial weights. -> Worng
Autoencoder 是 input -> encoder -> hidden layer -> decoder -> output
儘量讓 input = output, 主要用於 dimension reduction (nonlinear version of PCA).
照 HInton 的解釋，是 unsupervised pre-train using contrast divergence to initial the weight.
input -> hidden1 -> hidden2 <-> RBM
hidden1 and hidden2 是 feature extractions. 這是用 generative model 來 initial weights.
如何 converge? minimize energy!! 最後 generative model 回頭產生的 output 會非常類似 input digits features.
但並非 input = output!!
最後再用 discriminative model (softmax or SVM) fine tune weight 來做 classification!
=> 所以 deep neural net 和 auto encoder 雖然都用 stacked RBm, 但並非同一件事。
However, 在 lecture 15f => Shallow auto-encoder is not good for pre-train.
但是有一種 auto-encoder, namely stacked de-noising auto-encoder, 可以用來做 deep learning 的 pre-train.