Linear Algebra 幾何詮釋

by allenlu2007

本文參考 Coranac “Matrices from a geometric perspective”


 I’ve been asking people for geometric interpretations for linear algebra forever, and no math people ever seem to have an answer, or even understand why I’d want an answer (“the forumla’s right there!”).

So to make sure I understand this correctly, the determinant gives what is essentially a volume of the shape given by the column vectors. And the trace, as the determinant’s derivative, gives us by how much that volume would change over time as we vary…what?



Linear Algebra (LA) 雖然


Vector Inner Product/Dot Product (向量內積)

vector inner product 的幾何意義就是投影量。


vec(A) • vec(B) = |A| |B| cos t  = A’ B   (A, B are both column vectors)

vec(A) • vec(B)/|B| = |A| cos t


Matrix and Multiply (Column) Vector 

A x 此時 A 可視為 row vectors 集合.  Times column vector



a3] • x =




假設 A 是 orthonormal matrix (A’A = AA’ = I) -> ai • ai’ = δij 

{a1, a2, a3} 形成 bases, 可視為 {e1,e2,e3} 的 rotation 座標。∙••••

因此 A x 就成為 x vector 反 rotation/reflection 的 operation.


另一個(比較好)看法來自 Coranac

把 matrix 看成 column vectors 的集合

M matrix = [ u v ]   where u and v are column vectors.



u = [ux, uy]’ = [1, 0]’    and    v = [vx, vy]’ = [1, 2]’   是 matrix M 的 two column vectors (based on e1 and e2).

當 matrix M multiply by a vector x = [x, y]’.   where x and y 就是 x 在 [u v] column vectors span space 的座標。   x’ = M x 就是 u and v 的 linear combination.  x’ 就是 linear combination 最後的結果。另一個說法是把 [u v] space 座標轉回 [e1 e2] space 的座標。


舉例而言,如果 x = [1, 0]’   M x = [ ux, uy ]’ = u = [1, 0]’  => 就是 1 個 u

if x = [0, 1]’  M x = [vx, vy]’ = v = [1, 2]’  => 就是 1 個 v

if x = [1, 2]’   M x = 1 u +  2 v =  [3, 4]’   => 就是 1 個 u + 2 個 v 


Rotation Matrix



Scaling Matrix


NewImage  Sx = 2;  Sy = -1/2;   det(M) = -1


Translation and Homogeneous Matrix


where t = [tx, ty]’ 

x’ = M x + t  = x u + y v + 1 t  = [x ux + y vx + tx,  x uy + y vy + ty]’

乍看之下,translation 是一個 constant vector t addition, 可以詮釋為 DC bias/offset.  無法用原來的 M matrix 表示,也就是 non-homogeneous operation.   這對一些 machine learning or statistics 應用比較麻煩,可以參考前文關於 SVM learning.  

如果可以跳出目前的 2-D, 而加入新的 dimension (3-D), 可以重新把 translation operation 變成 Matrix multiplication, 也就是 homogeneous operation.  新的 3-D coordinate 可稱為 homogeneous coordinates.  


其實這在 projective geometry 是標準技法!!

更有趣的是不僅限於線性方程式。任何多項式均可用此法改 offset 為 homogeneous equations.

二次式更有趣,不論 parabola, hyper-parabola, ellipse with offset 都可轉為 3-D homogeneous ellips. 


Orthonormal Matrix  (么正矩陣)

AA’ = A’A = I  =>  A^-1 = A’  =>   det(A).det(A’) = 1  => det(A) = det(A’)   det(A) = +1 or -1 


假設 A 是 orthonormal matrix (A’A = AA’ = I) -> ai . ai’ = delta_ij 

a1, a2, a3 形成 orthonormal bases, 可視為 e1,e2,e3 的 rotation or reflection (旋轉或鏡射)座標。


什麼時候 det(A) = +1, 或是 det(A) = -1.

簡單說如果 A 是 {e1, e2, e3} rotation (旋轉)而得到 matrix, det(A) = +1, 因為 sign volume 不變。

如果 A 是 {e1, e2, e3} 奇數次 reflection, e.g. {-e1, e2, e3} 而得到 matrix, det(A) = -1, 因為 sign volume 反向。 


Matrix Inversion (逆矩陣)

M 是把 [u v] space 的 vector 轉到 [e1 e2] space 座標.  如果反過來把 [e1 e2] space 的 vector 轉到 [u v] space 座標就是 M^(-1) 的幾何意義。代數上  A • A^(-1) = A^(-1) • A = I

Matrix inversion 存在的先決條件是 [u v] is linear independent = M is full rank = det(M) <> 0

對於 orthonormal matrix AA’ = A’A = I  =>  A’ = A^(-1)  => inverse matrix 基本上是反轉或反 reflection operation.




Matrix Transpose (轉置)

Matrix transpose A’ 雖然看來簡單。在代數上和 dual space 相關。可參考 Least square and least norm 一文。這是最 general case (n <> m).

以下 WRONG!!


假設 A 是 square matrix  [a, b; c, d] , A’ = [a, c; b, d] 可視為沿著 x = y (45度線) 的對稱 reflection.

因為 [a, b] [c, d] 和 [a, c] [b, d] symmetry with x=y

(A’)’ = A   and  det(A) = det(A’)   




Vector Outer Product  (向量外積)

Let both u and v in R(3).    u = [u1, u2, u3]’  v = [v1, v2, v3]’ 

u x v’ =  u v’ = [

u1.v1   u1.v2   u1.v3

u2.v1   u2.v2   u2.v3

u3.v1   u3.v2   u3.v3 ] 

很明顯 outer product 不是 full rank matrix.  

u x v’ 的三個 row vectors 是 {u1.v’,  u2.v’,  u3.v’} 都是指向 v 方向。所以 Rank(u x v’) = 1


Add figure here!!


(u x v’ )’ = v x u’   or simply  uv’

(dim: 3×1 1×3 3×1 = 3×1)

(u x v’) y = [u1.v1.y1+u1.v2.y2 + u1.v3.y3,   u2.v1.y1+u2.v2.y2+u2.v3.y3, u3.v1.y1+u3.v2.y2+u3.v3.y3]’

= [u1.(v1.y1)+u1.(v2.y2)+u1.(v3.y3), u2.(v1.y1)+u2.(v2.y2)+u2.(v3.y3), u3.(v1.y1)+u3.(v2.y2)+u3.(v3.y3)]’

= [ u1.(v1.y1+v2.y2+v3.y3), u2.(v1.y1+v2.y2+v3.y3), u3.(v1.y1+v2.y2+v3.y3)]’

= (v1.y1+v2.y2+v3.y3) [u1, u2, u3]’ = (v’ y) u   (dim: 1×3 3×1 3×1!)

=> (u v’) y = (v’ y) u


所以外積產生的 matrix (u x v’) (rank=1) 對任意 y vector 的 operation (transformation) 都會改指向 u 方向。

注意 outer product 和 covariance matrix 並不相同。Assuming X = (X1, X2, X3)’ 是 random vectors.  where X1, X2, X3 are zero mean.

Covariance matrix sigma = E(X X’) = E[

E(X1^2),  E(X1 X2)  E(X1 X3)

E(X1 X2)  E(X2^2)   E(X2 X3)

E(X1 X3)  E(X2 X3)  E(X3^2) ]

Covariance matrix 是 PSD (positive semidefinite) matrix.   Rank=n (number of elements) = 3 (above case).


所以不要混淆 vector outer product with covariance matrix.  最主要差別在 covariance matrix 成形多了 E()

E(X1)E(X2) <> E(X1 X2).  同時 covariance matrix 基本上是 full rank (除非有 2 個 r.v. 是完全 correlated).

反之 vector outer product 則是 rank=1.   


所以 outer product 有何用?  據說對於 SVD 有用?



先用一個 2×2 matrix 包含 [a, b] and [c, d] row vectors


幾何意義如下。就是 row vectors (or column vectors) 包含的面積 (高維就是體積)。

Determinant 只有在方陣 (nxn) 有意義。一般 matrix (nxm n <>m) 無意義。

det(A) = det(A’)


determinant gives what is essentially a volume of the shape given by the column vectors.

det(orthonormal matrix) = +1 or -1  (because row vectors are orthonormal)

det(A) = det(O S O’) = det(S) = products of eigenvalues

det(AB) = det(A) det(B)

  1. \det(I_n) = 1 where In is the n × n identity matrix.
  2. \det(A^{\rm T}) = \det(A).
  3. \det(A^{-1}) = \frac{1}{\det(A)}=\det(A)^{-1}.
  4. For square matrices A and B of equal size,
\det(AB) = \det(A)\det(B).
  1. \det(cA) = c^n\det(A) for an n × n matrix.


Eigenvalue and Eigenvector

* Axis of invariance:   A x = λ x 

* PCA:  見前文 PCA 部份。Eigenvalue max 就是 projection variance, 方向是相對應的 eigenvector.




Trace (sum of eigenvalues;  det = product of eigenvalue)

* Spectral properties of the matrix A. 

det(exp(A)) = exp(tr(A))

tr(exp(A))  = det


* Projection property of trace!  (參考林軒田 linear regression talk)

If H’ = H and H^2 = H  ==> 代表 H 是 projection matrix (double projection = single projection)

或是 H 是餘數 matrix!!

trace(H) = dimension of the space being projected onto. (n-d)


參考 wiki Idempotent matrix (H^2 = H), 不需要 symmetry 也成立。

除了 I 以外,H 必定是 singular. 同時所有 eigenvalue is either 1 or 0.

Trace(H)=rank(H) 一定是 positive integer.  trace(I-H) = n – rank(H)  


Specially,  H = A A_dagger = A (A’A)^(-1) A’   (A_dagger is the pseudo inverse of A)

A_dagger = (A’ A)^(-1) A’


H 是 projection matrix,  I – H 是餘數 matrix   (H’ = H and (I-H)’ = I-H)

H*H = H   同時 (I-H)*(I-H) = I – 2H + H*H = I – H

所以 both H and I-H 都滿足 projection matrix definition.

trace(H) = rank(A) = k    assuming dim(A) = n x k  (n > k)

trace(I-H) = trace(I) – trace(H) = n-k


A =


>> pinv(A)

ans =

0.0714 0.1429 0.2143

>> H = A * pinv(A)

H =

0.0714 0.1429 0.2143
0.1429 0.2857 0.4286
0.2143 0.4286 0.6429

>> H * H

ans =

0.0714 0.1429 0.2143
0.1429 0.2857 0.4286
0.2143 0.4286 0.6429

>> H * H * H

ans =

0.0714 0.1429 0.2143
0.1429 0.2857 0.4286
0.2143 0.4286 0.6429

>> trace(H)

ans = 1

>> trace(eye(H) – H)

ans = 2



dx/dt = A x  ->  x = exp(At)

det(exp(At)) = exp(tr(At)) = exp(tr(A)t) =>  t = 0   area = 1,  over time t is small  exp(  


Eigenvalue relationships[edit]

If A is a square n-by-n matrix with real or complex entries and if λ1,…,λn are the eigenvalues of A (listed according to their algebraic multiplicities), then

\operatorname{tr}(A) = \sum_i \lambda_i.

This follows from the fact that A is always similar to its Jordan form, an upper triangular matrix having λ1,…,λnon the main diagonal. In contrast, the determinant of A is the product of its eigenvalues; i.e.,

\operatorname{det}(A) = \prod_i \lambda_i.

* The trace is related to the derivative of the determinant (see Jacobi’s formula).

If your matrix is governing the dynamics of a system of ODEs (i.e. x’ = Ax), then you get a solution of the form exp(tA) x_0, so any initial point x_0 flows along the path exp(tA) x_0 (as t varies). Now, the asymptotic behavior of this is profoundly influenced by the spectral properties of the matrix A, e.g. if A is diagonalizable and all eigenvalues are negative, then asymptotically your dynamical system is contracting everything to the origin, and e.g. if the eigenvalues are all positive then you are expanding everything away from the origin.

In general, you can prove det(exp(tA)) = exp(trace(tA)) (use Jordan form, for example), so if your matrix A has trace 0, the flow is volume preserving. Negative trace corresponds to volume decreasing flows, positive trace gives volume increasing flows.




Null space