Dimension Reduction Algorithms

by allenlu2007

Reference:  MLSP2012: Manifold Learning Modeling and Algorithms



上文提到 dimension reduction 一個很有用的地方是用於 data visualization.

以上 reference 提到 dimension reduction 的用途:

* Visualization: insight into the dataset

* Compression: storage

* De-noising: remove redundant dimension (noise sits on extra dimension) 就像 signal processing 的 filter can remove excess noise (de-noise).

Manifold Hypothesis:  

The manifold hypothesis is that real-world high dimensional data (such as images) lie on low-dimensional manifolds embedded in the high-dimensional space.

以上的 hypothesis 是 manifold learning 的基礎。Dimension reduction 可視為 manifold hypothesis 的應用。如果能找到 low dimension manifold, 就可能 reduce data dimension. 

Dimension reduction 如果有幾何詮釋,基本上都可視為 manifold hypothesis 的一種。例如 PCA or ICA.  

Dimension reduction 分類:

* Linear and Non-linear

* or Linear models and algorithm vs. nonlinear manifold (learning)

* Parameter based vs. Geometric based vs. graph based vs. topology based vs. probabilistic based


Representation (high dimension) —> Underlying structure parameter (very low dimension)

也許無法找出 underlying structure parameter, 但至少可以做 new data inference (classification or regression).

Linear Dimension Reduction

PCA (rotation and find the maximize variance axis)





MDS (Multi-Dimensional Scaling)





PCA and MDS (?) 都可似為 linear dimension reduction.


到底 MDS 是 linear or non-linear dimension reduction?

Key: 如果 MDS 只能用 scaling and rotation operation.  其實就是 PCA!

如果 MDS 可以用其他的 operation (e.g. kernel PCA), 就變成 nonlinear dimension reduction.

Lawrence talk in MLSS 2012 非常好。


一般作 MDS 是從 covariance matrix 出發 (y y’), 找 eigenvalue/eigenvector. 基本上是 linear dimension reduction (scaling and rotation).

但等效可以從 mutual distance matrix or similarity matrix 出發 (和 covariance matrix 有一個固定關係), 可以得到一樣的結果。

但如果在 mutual distance matrix/similarity matrix 上做一些 nonlinear transformation (就是 kernel trick), 就變成 nonlinear dimension reduction.  但仍可以用 eigenvalue/eigenvector 方法求解。就是 optimization 的方法仍然是 linear (algebra), 但多了 nonlinear distance mapping!!



 重點是 kernel PCA 並非用來做 dimension reduction.  一般反而會 increase dimension, 特別是 nonlinear mapping 是 infinite dimension 時。但好處是 kernel PCA + (linear) classifier 可以做出 nonlinear classifier.

意思類似 kernel SVM.   






對本身就是 nonlinear dataset 如下圖。基本上無法用 linear dimension reduction to 1D line.



此時需要 manifold learning.