Similarity, Distance/Metric, and Manifold Learning II

by allenlu2007


[1] R. C. Wilson, ECCV2012 Tutorial, “Similarity, Distance, and Manifold Learning

[2] S. Hauberg, et al, NIPS 2012, “A Geometric Take on Metric Learning

[3] M. Hauser, PSU 2018 PhD Dissertation, “Principles of Riemannian Geometry in Neural Networks

[4] Konstantin,, “What is the Covariance Matrix?

[5] “In-Depth: Manifold Learning”  => excellent example

[6] Youtube, “Metric learning and manifold: preserving the intrnsic geometry



之前學 group theory 和 tensor calculus ,連結到量子場論。可以從 QED U(1) symmetry to Maxwell equations.


把 tensor calculus 從 Euclidean (differential) geometry 推廣到 Riemannian (differential) geometry, 可以連結到廣義相對論。


右手 T 是 energy-momentum tensor, 代表 mass-energy distribution.

左手 G 是 Einstein tensor (R 是 Ricci tensor, g is metric tensor), 代表 space-time curvature.


Tensor calculus 和 differential geometry 能夠用於 Quantum Field Theory and General Relativity 兩大物理學,已經是非常幸福。

更讓人驚訝的是還可以用於 machine learning 和 deep learning [1-3] !!!!!


More surprise in the future?  Perelman (based on Hamilton’s direction) used Ricci tensor and flow to prove Poicare theorem!



Tip0: Manifold learning is NOT learning manifold.  而是考慮 data on low dimension manifold 的 manifold 影響,達到最好的降維效果。  Manifold learning + metric learning (isometric) 似乎效果更好。 

Tip1: Distance (or metric space) 的數學定義如下:


Tip2: Distance 在一個 flat space (surface curvature = 0, cylinder surface 是 flat) 的定義如下:


M 稱為 metric tensor, 是 PSD, 所有 eigenvector 都是正交。

Tip3: Metric tensor 和 covariance matrix (PSD) 是 inverse matrix! [4]


我們從特例出發 [1]:  Linear Euclidean space (no Neural network), all data point (no batch, no sampling), deterministic (no stochastic gradient).

Conclusion first:  PCA, MDS, kernel embedding give the same answer for a set of data points in Euclidean space.

Position (Feature vector)  /   Similarity  / Distance 


Golden Trio






以上的 position – similarity 對應的是降維常用的 PCA (Principal Component Analysis)!!!

就是找 X X’ 的 eigen-decomposition.

Linear operation, NO metric conservation!!


Kernel embedding 就是從 kernel matrix (e.g. similarity matrix) 反推 feature vectors (or manifold).

對於 similarity matrix 非常容易!







Kernel Method

Kernel 的定義就像 similarity matrix!  (Similarity matrix 是一種 special kernel).  但 kernel 不必是 similarity matrix!!

但重點在于:Without needing to know the actual points!!! (Recall SVM kernel method)

原則上任何的 positive semi-definite matrix 都可以。

例如 famous RBF (Radial Basis Function)

NewImage => similar to Laplacian Eigenmap; what’s the physical intuition?

=> heat diffusion => diffusion => random walk around the local neighboor?

要從 RBF kernel 算出 embedding 就不容易!



Vanila PCA and Kernel PCA

Vanila PCA



Vanila PCA 是找所有 xi (high dimension D) 最大的 variance axis, 就是對應最大 eigenvalue 的 eigen-vector.  再來找第二大的 eigen-vector, etc.  用這些 eigen-vectors 建構 low dimension space (d).  理想上沒用到的 dimension (D-d) 對應 eigenvalue=0.

Kernel method 有點反其道而行之。所有 xi 直接視為 low dimension features (as it should)!  用 xi 的 linear combination 作為 kernel



PCA:  K = XX’ 的 eigenvalue and eigenvector

Kernel PCA: K, K^2, K^3, … exp(K^2), O(K) 都有同樣的 eigen-vectors!  但不同的 eigenvalues!  Use K instead of separate X!! 

Q:  min => max? (otherwise alpha = 0 就會得到 min?)





PCA ==  Kernel embedding == MDS in Euclidean space


Kernel and Manifold

Question:  Kernel PCA 可以視為 Riemannian manifold? (No)

It seems:

Kernel is Kernel (extrinsic? Or weakly data dependent, only some parameters TBD)  ;  

Manifold is Manifold (explore data structure directly)

Kernel 到底和 Manifold 有沒有關係?

Kernel: extrinsic, weakly data dependent (global geometry characteristic? => decision boundary)

Manifold: data intrinsic, local geometry characteristic


Kernel method 比較適合在 nonlinear decision boundary (不論是 Euclidean space or on manifold) on the same manifold.



Manifold method 如下。Unroll manifold 之後的 decision boundary 仍然是 linear boundary.

但要使用 geodesics 取代直線!



Kernel and Manifold 可以有四種不同的組合。

1. Linear mapping (Non-kernel) (e.g. PCA) in Euclidean space

2. Manifold method (geodesic instead of straight line)

3. Kernel in Euclidean space (non-linear decision boundary)

4. Kernel on (+) manifold (non-linear decision boundary on manifold)


從特例 (Global, Linear Euclidean space) 到 Manifold (Local, Riemannian manifold)

PCA in Euclidean space 對應 PGA (Principal Geodesics Analysis) on manifold. 


直觀上 Manifold 就是用 geodesic 取代直線。但是如何做出不同的 eigen-vectors on manifold?


先從基本 distance/metric 開始。在 Euclidean (orthonormal base) space distance (L2) 直接從畢氏定理得出。

Dist^2(xi, xj) = |xi – xj|^2 = (xi – xj)’ Mx (xi – xj) = (xi – xj)’ (xi – xj)    Mx = (1,0; 0, 1) metric tensor

首先推廣到 non-orthonormal (skewed) base but globally flat (curvature=0) vector space

using a (non-orthonormal) linear mapping: y = Ax

現在問題是如何定義 skewed space 的 distance? 我們假設這個 linear mapping 是 preserve metric (isometry 等距).

Dist^2(yi, yj) = (Axi – Axj)’ My (Axi – Axj) = (xi – xj)’ A’ My A (xi – xj)  = (xi – xj)’ (xi – xj)

A’ My A = I   =>  My = A’^-1 A^-1 = (AA’)^-1   = (A’A)^-1  (AA’ is PSD).

where M = (A’A)^-1 (symmetry psd) 稱為 metric tensor.

因為 M 是 symmetry, 所有 eigenvalues >= 0, 同時所有 eigenvectors are orthogonal => real spectral theorem. 



Embedding / Covariance / Metric [2]

Let’s start with a set of data points xi, first compute Kij = <xi, xj> similarity or covariance matrix.   Metric tensor, M, 是 inversed covariance matrix!


假設 M 是一個定值,在 manifold 上的任兩點的距離可以用以下定義:


以上是假設存在一個 global embedding/manifold, 或是另一個說法, one metric tensor fits all data.

顯然在很多情況不是實際的假設。策略就是使用多個 metric tenors, =>

Position dependent metric tensors (manifold).

假設我們之知道兩點 (x1 and x2) 的 metric tensors (M1 and M2).  對於所有其他點都用 M1 or M2 代表 (which is closer)。


如果 M1 = (2, 0; 0, 2) = 2I @x1=(2, 2)’ ;  M2 = (1, 0; 0, 1) = I @x2=(4, 4).

離 x1=(2,2) 比較近的區域 (下圖的白區)用 M1 算 dist.

離 x2=(4,4) 比較近的區域 (下圖的灰區)用 M2 算 dist.

為什麼是白色圓形? 因為是等距線 (2:1)。白色圓形可以視為禁區。直線穿越禁區會改繞道灰區 (下圖 (a)(b)).




推廣到 R 個點和 metric tensors {M1, …, MR} 所有 x 的 metric tensor 就是 “the nearest” metric tensor.

為什麼 distance 不對稱??


再來看 swiss roll manifold.  Data points 附近的 metric 類似 I,  但是其他地方 metric >> I, 也就是禁區。

所以 A 到 B 需要 go over data points.



How does it works?

1. Consider lots of data points.  Space-wise cluster points to compute the covariance matrix => metric tensor.

2. Generalize PCA => PGA




* Use exp(|x-y|’ sigma^-1 |x-y|) probability density to illustrate

1. Covariance matrix (sigma) is the inverse of metric (M)

2. exp() map and log map between tagent space and manifold





(3) labelled metric tensors (同類的 distance 比較小)




Non-linear Manifold [1]

[1] 把 non-linear (non-Euclidean) manifold 分為兩類:

1. Extrinsic curved, intrinsic flat (homomorphic to Euclidean)

2. Intrinsic curved




第一類 non-linear but homomorphic to Euclidean space

用 geodesic 代替直線就可以。






唯一的問題是如何計算任兩點的 geodesic distance? 

(A) 先找出 manifold (known manifold or manifold learning) 以及 metric tensors.  再用微分幾何求出 geodesic distance of any two points.

(B) 完全 bypass manifold, 由 graph 找出 shortest path (approx. geodesic) of any two points.  再把距離加起來得到 geodesic distance.

(A) 是 exact solution (假設能找到 exact manifold), 但非常 computation intensive.

(B) 雖然是近似,但不需要計算 manifold 以及 metric tensors.  使用 graphic, 比較簡單。

三個算法 ISOMAP (B), LLE (similar to B), Laplacian eigenmap, t-SNE


ISOMAP (Graph base, preserve neighbor and local structure)





Laplacian Eigenmap (graph base, 使用 graph Laplacian instead of geodesic) (good reference)




Locally-Linear Embedding (graph base, 使用 reconstruction weights)








第二類 intrinsic curved (not homomorphic to Euclidean space)

ISOMAP, Lap. Eigenmap, LLE 似乎也可以用於 intrinsic cuved manifold?

Can we do direct manifold learning/embedding on the curved manifold? yes, see below




Manifold (Topology) and Metric learning [youtube of female speaker] 

Topology only (e.g. Similarity, MDS, LLE, Lap. Eigenmap) ,  or Topology with Metric (preservation)