令 d e t ∣ λ I − A ∣ = 0 det|\lambda I-A |= 0 det∣λI−A∣=0 (此式被称为矩阵A的特征多项式)我们可以得到下式:
( λ − λ 1 ) n 1 ( λ − λ 2 ) n 2 ⋅ ⋅ ⋅ ( λ − λ l ) n l = 0 (\lambda - \lambda _{1})^{n_{1}}(\lambda - \lambda _{2})^{n_{2}}\cdot \cdot \cdot (\lambda - \lambda _{l})^{n_{l}} = 0 (λ−λ1)n1(λ−λ2)n2⋅⋅⋅(λ−λl)nl=0
∑ i = 1 l n i = N \sum_{i = 1}^{l}n_{i} = N ∑i=1lni=N
这样我们便可以得到矩阵A的 l l l 种不同的特征值,每种特征值有 n i n_{i} ni 个重复,共N个特征值。对于特征向量的求解,我们放在下一小节。从这里我们可以看出矩阵A可以有多个特征值和多个特征向量,每一个特征值对应一个或多个特征向量(此时该对应的特征值有重复),不同特征值对应的特征向量之间线性不相关,重根对应的多个特征向量之间不一定不相关。特征值分解将矩阵分解成为了如下形式
A = Q Λ Q − 1 A = Q\Lambda Q^{-1} A=QΛQ−1
对于没有重根的情况 Λ = d i a g ( λ 1 , λ 2 , ⋅ ⋅ ⋅ , λ n ) \Lambda = diag(\lambda_{1},\lambda_{2},\cdot \cdot \cdot,\lambda_{n}) Λ=diag(λ1,λ2,⋅⋅⋅,λn) ,对于有重根的情况 Λ \Lambda Λ 为Jordan标准型。这里的Q是以特征向量为列向量的NxN矩阵
1.2 特征向量的求解
求解特征向量有多种方法这里介绍最简单也是最常用的方法
计算A的特征多项式 d e t ∣ λ I − A ∣ = 0 det|\lambda I-A |= 0 det∣λI−A∣=0,从而求得特征值 λ i \lambda_{i} λi
对于单根特征值来说,求齐次方程 ( λ i I − A ) v i = 0 (\lambda_{i} I-A)v_{i} = 0 (λiI−A)vi=0,即可求得该特征值对应的特征向量
对于有重根的特征值来说,可以使用一下公式,依次迭代求解
v 1 : A v 1 = λ v 1 v_{1} : Av_{1} = \lambda v_{1} v1:Av1=λv1
v 2 : A v 2 = v 1 + λ v 2 v_{2} : Av_{2} =v_{1} + \lambda v_{2} v2:Av2=v1+λv2
v 3 : A v 3 = v 2 + λ v 3 v_{3} : Av_{3} =v_{2} + \lambda v_{3} v3:Av3=v2+λv3
⋅ ⋅ ⋅ \cdot \cdot \cdot ⋅⋅⋅
v N : A v N = v N − 1 + λ v N v_{N} : Av_{N} =v_{N-1} + \lambda v_{N} vN:AvN=vN−1+λvN
设 λ 1 , λ 2 , ⋅ ⋅ ⋅ , λ N \lambda_{1},\lambda_{2},\cdot\cdot\cdot,\lambda_{N} λ1,λ2,⋅⋅⋅,λN是A的 l l l个互不相同的特征值,重数依次为 r 1 , r 2 , ⋅ ⋅ ⋅ , r N r_{1},r_{2},\cdot\cdot\cdot,r_{N} r1,r2,⋅⋅⋅,rN,且有 r 1 + r 2 + ⋅ ⋅ ⋅ + r l = N r_{1} + r_{2}+\cdot\cdot\cdot +r_{l} = N r1+r2+⋅⋅⋅+rl=N,则A可以相似对角化的充分必要条件为:A的 r i r_{i} ri重特征值 λ i \lambda_{i} λi恰有 r i r_{i} ri个线性无关的特征向量 ( i = 1 , 2 , ⋅ ⋅ ⋅ , l ) (i = 1,2,\cdot\cdot\cdot,l) (i=1,2,⋅⋅⋅,l)
注:此处的 r i r_{i} ri又被称为代数重数,而实际的线性无关的特征向量的个数称为几何重数 R i R_{i} Ri。我们有 R i < = r i R_{i} <= r_{i} Ri<=ri总是成立的,推论二是指只有当 R i = r i R_{i} = r_{i} Ri=ri时方阵A才可以相似对角化
A v = λ v , v ≠ 0 Av = \lambda v , v \neq 0 Av=λv,v=0
可以据此推导
v ˉ T A v = v ˉ T A T v = ( A v ˉ ) T v = ( A v ‾ ) T v \bar{v}^{T}Av = \bar{v}^{T}A^{T}v = (A\bar{v})^{T}v = (\overline{Av})^{T}v vˉTAv=vˉTATv=(Avˉ)Tv=(Av)Tv
λ v ˉ T v = λ ˉ v ˉ T v \lambda \bar{v}^{T}v = \bar{\lambda} \bar{v}^{T}v λvˉTv=λˉvˉTv
所以
λ = λ ˉ \lambda = \bar{\lambda} λ=λˉ
λ \lambda λ为实数,因此 d e t ∣ A − λ E ∣ x = 0 det|A-\lambda E|x= 0 det∣A−λE∣x=0必有实的基础解系,从而对应的特征向量可以取实向量
Q − 1 A Q = Q T A Q = Λ = d i a g ( λ 1 , λ 2 , ⋅ ⋅ ⋅ , λ N ) Q^{-1}AQ = Q^{T}AQ = \Lambda = diag(\lambda_{1},\lambda_{2},\cdot\cdot\cdot,\lambda_{N}) Q−1AQ=QTAQ=Λ=diag(λ1,λ2,⋅⋅⋅,λN)
若 A ϵ R n × m A \epsilon R^{n\times m} AϵRn×m,且 n > = m n >= m n>=m,则存在列正交矩阵 Q ϵ R n × m Q \epsilon R^{n\times m} QϵRn×m和上三角矩阵 R ϵ R m × m R \epsilon R^{m\times m} RϵRm×m,使得 A = Q R A=QR A=QR。 当 m = n m=n m=n 时 , Q Q Q为正交矩阵。如果A是非奇异的 n × n n \times n n×n 矩阵,则R的所有对角线元素均为正,并且在这种情况下, Q Q Q和 R R R是唯一的。若A是复矩阵,则 Q Q Q和 R R R取复值
关于QR分解的证明,这里根据Schmidt 正交化的方法给出当A为列满秩情况下的证明:
将 A A A 表示为 A = [ x 1 , x 2 , ⋅ ⋅ ⋅ , x m ] A = [x_{1},x_{2},\cdot\cdot\cdot,x_{m}] A=[x1,x2,⋅⋅⋅,xm]
由于 A A A满秩,所以 x i x_{i} xi 之间线性独立,通过Schmidt 正交化我们可以得到一组正交向量和一个上三角矩阵如下
[ u 1 u 2 ⋅ ⋅ ⋅ u m ] = [ x 1 x 2 ⋅ ⋅ ⋅ x m ] [u_{1} u_{2}\cdot\cdot\cdot u_{m}] = [x_{1} x_{2}\cdot\cdot\cdot x_{m}] [u1u2⋅⋅⋅um]=[x1x2⋅⋅⋅xm] [ t 11 ⋅ ⋅ ⋅ t 1 m 0 ⋱ ⋮ 0 0 t m m ] \begin{bmatrix} t_{11} & \cdot\cdot\cdot & t_{1m} \\ 0 & \ddots & \vdots \\ 0 & 0 & t_{mm}\end{bmatrix} ⎣⎢⎡t1100⋅⋅⋅⋱0t1m⋮tmm⎦⎥⎤
U = A T U = AT U=AT
这里的T矩阵是Schmidt 正交化的变换矩阵,由于
t i i = ∥ x i − ∑ j = 1 i − 1 < u j , x i > u j ∥ − 1 t_{ii} = \begin{Vmatrix} x_{i} - \sum_{j = 1}^{i -1}<u_{j},x_{i}>u_{j} \end{Vmatrix}^{-1} tii=∥∥∥xi−∑j=1i−1<uj,xi>uj∥∥∥−1
矩阵T是非奇异的,同时 T − 1 T^{-1} T−1也同样为上三角矩阵,令 Q = U Q = U Q=U, R = T − 1 R = T^{-1} R=T−1,我们便可以得到 A = Q R A = QR A=QR
对于一个矩阵 U ϵ F n × m U\epsilon F^{n\times m} UϵFn×m ,如果 U H U = I U^{H}U = I UHU=I (H为共轭转置)我们便称 U U U 为一个等距(isometry),它的所有列向量是正交的。等距作为一个线性变换时是保内积运算,同时也是保模长运算,即
< U x , U y > = < x , y > <Ux,Uy> = <x,y> <Ux,Uy>=<x,y>
∥ U x ∥ = < U x , U y > 1 / 2 = < x , y > 1 / 2 = ∥ x ∥ \begin{Vmatrix} Ux \end{Vmatrix} = <Ux,Uy>^{1/2} = <x,y>^{1/2} = \begin{Vmatrix} x \end{Vmatrix} ∥∥Ux∥∥=<Ux,Uy>1/2=<x,y>1/2=∥∥x∥∥
4.1.2 “协等距”(co-isometry)
对于一个矩阵 U ϵ F n × m U\epsilon F^{n\times m} UϵFn×m ,如果 U U H = I UU^{H} = I UUH=I 我们便称 U U U 为一个协等距(isometry),它的所有的行向量是正交的。
4.1.3 酉矩阵(unitary matrix)
一个矩阵如果既满足等距,又满足协等距,我们便就称它为酉矩阵,它的最大特点在于 U − 1 = U H U^{-1} = U^{H} U−1=UH。酉矩阵其实是正交矩阵在复数域上的推广,它满足
U U H = U H U = I UU^{H} = U^{H}U = I UUH=UHU=I
4.2 Schur分解的定义与推导
方阵 A ϵ F n × n A\epsilon F^{n\times n} AϵFn×n 具有特征值 λ 1 , λ 2 , ⋅ ⋅ ⋅ , λ n \lambda_{1},\lambda_{2},\cdot\cdot\cdot,\lambda_{n} λ1,λ2,⋅⋅⋅,λn ,则存在一个酉矩阵 U ϵ C n × n U\epsilon C^{n\times n} UϵCn×n 使得
U H A U = T U^{H}AU = T UHAU=T
T T T为一个上三角矩阵,它的对角线元素 t i i = λ i t_{ii} = \lambda_{i} tii=λi。现在来证明Schur分解的存在
记 x i x_{i} xi 为对应于特征值 λ i \lambda_{i} λi 的特征向量,令 X 1 = [ x 1 , x 2 , ⋅ ⋅ ⋅ , x n ] X_{1} = [x_{1},x_{2},\cdot\cdot\cdot,x_{n}] X1=[x1,x2,⋅⋅⋅,xn]
对 X 1 X_{1} X1 进行QR分解,可以得到 X 1 = Q 1 R 1 X_{1} = Q_{1}R_{1} X1=Q1R1, Q 1 Q_{1} Q1这里是酉矩阵, R 1 R_{1} R1是上三角矩阵。要注意的是 Q 1 Q_{1} Q1 的第一列仍然是 A A A 对应于特征值 λ i \lambda_{i} λi 的特征向量,因此有
Q 1 H A Q 1 = [ λ 1 ∗ 0 A 1 ] Q_{1}^{H}AQ_{1} = \begin{bmatrix} \lambda_{1} & * \\ 0 & A_{1} \end{bmatrix} Q1HAQ1=[λ10∗A1]
这里 A 1 ϵ C ( n − 1 ) × ( n − 1 ) A_{1} \epsilon C^{(n-1)\times(n-1)} A1ϵC(n−1)×(n−1),它的特征值为 λ 2 , ⋅ ⋅ ⋅ , λ n \lambda_{2},\cdot\cdot\cdot,\lambda_{n} λ2,⋅⋅⋅,λn
使用同样的步骤,我们又可以得到一个酉矩阵 Q 2 ϵ C ( n − 1 ) × ( n − 1 ) Q_{2} \epsilon C^{(n-1) \times (n-1)} Q2ϵC(n−1)×(n−1),得到
Q 2 H A 1 Q 2 = [ λ 2 ∗ 0 A 2 ] Q_{2}^{H}A_{1}Q_{2} = \begin{bmatrix}\lambda_{2} & * \\ 0 & A_{2} \end{bmatrix} Q2HA1Q2=[λ20∗A2]
对于一个矩阵 A ϵ F m × n A\epsilon F^{m\times n} AϵFm×n,可将其写成如下形式
A = U Λ V T A= U\Lambda V^{T} A=UΛVT
其中 U ϵ F m × m U\epsilon F^{m\times m} UϵFm×m 的酉矩阵,被称为左奇异向量; Λ ϵ F m × n \Lambda \epsilon F^{m\times n} ΛϵFm×n的半正定对角矩阵; V H ϵ F n × n V^{H}\epsilon F^{n\times n} VHϵFn×n 是 V V V的共轭转置,被称为右奇异向量。这样的分解就叫做奇异值分解, Λ \Lambda Λ对角线上的元素 λ i \lambda_{i} λi即为原矩阵 A A A的奇异值, 奇异值一般按照从大到小排列,即
λ 1 > = λ 2 > = ⋅ ⋅ ⋅ > = λ m i n ( n , m ) \lambda_{1} >= \lambda_{2} >= \cdot\cdot\cdot >= \lambda_{min(n,m)} λ1>=λ2>=⋅⋅⋅>=λmin(n,m)
奇异值分解的推导可以从特征值分解开始
首先,我们对n阶对称方阵 A T A A^{T}A ATA 作特征值分解,得到
A T A = V Λ V T A^{T}A= V\Lambda V^{T} ATA=VΛVT
通过特征值分解我们得到一组正交基 V = ( v 1 , v 2 , ⋅ ⋅ ⋅ , v n ) V = (v_{1},v_{2},\cdot\cdot\cdot,v_{n}) V=(v1,v2,⋅⋅⋅,vn),满足如下性质
( A T A ) v i = λ i v i (A^{T}A)v_{i} = \lambda_{i}v_{i} (ATA)vi=λivi
由于 A T A A^{T}A ATA为对称矩阵, v i v_{i} vi之间两两相互正交,所以有
< A v i , A v j > = v i T ( A T A ) v j = v i T λ j v j = λ j v i T v j = 0 <Av_{i},Av_{j}> = v^{T}_{i}(A^{T}A)v_{j} = v^{T}_{i}\lambda_{j}v_{j} = \lambda_{j}v^{T}_{i}v_{j} = 0 <Avi,Avj>=viT(ATA)vj=viTλjvj=λjviTvj=0
因为 r a n k ( A T A ) = r a n k ( A ) = r rank(A^{T}A) = rank(A) = r rank(ATA)=rank(A)=r,我们可以得到另一组正交基 A v 1 , A v 1 , ⋅ ⋅ ⋅ , A v r {Av_{1},Av_{1},\cdot\cdot\cdot,Av_{r}} Av1,Av1,⋅⋅⋅,Avr将其标准化有
u i = A v i ∣ A v i ∣ = 1 λ A v i u_{i} = \frac{Av_{i}}{|Av_{i}|} = \frac{1}{\sqrt{\lambda}}Av_{i} ui=∣Avi∣Avi=λ 1Avi
A v i = λ i u i = δ i u i Av_{i} = \sqrt{\lambda_{i}}u_{i} =\delta_{i}u_{i} Avi=λi ui=δiui
注:
∣ A v i ∣ 2 = < A v i , A v i > = λ i v i T v i = λ i |Av_{i}|^{2}=<Av_{i},Av_{i}>=\lambda_{i}v^{T}_{i}v_{i} = \lambda_{i} ∣Avi∣2=<Avi,Avi>=λiviTvi=λi
将向量组 ( u 1 , u 2 , ⋅ ⋅ ⋅ , u r ) (u_{1},u_{2},\cdot\cdot\cdot,u_{r}) (u1,u2,⋅⋅⋅,ur)扩充为 F m F^{m} Fm中的标准正交基 ( u 1 , u 2 , ⋅ ⋅ ⋅ , u r , ⋅ ⋅ ⋅ , u m ) (u_{1},u_{2},\cdot\cdot\cdot,u_{r},\cdot\cdot\cdot,u_{m}) (u1,u2,⋅⋅⋅,ur,⋅⋅⋅,um)则:
A V = A ( v 1 , v 2 , ⋅ ⋅ ⋅ , v n ) = ( A v 1 , A v 2 , ⋅ ⋅ ⋅ , A v r , 0 , ⋅ ⋅ ⋅ , 0 ) = ( δ 1 u 1 , δ 2 u 2 , ⋅ ⋅ ⋅ , δ r u r , 0 , ⋅ ⋅ ⋅ , 0 ) = U Λ AV = A(v_{1},v_{2},\cdot\cdot\cdot, v_{n}) = (Av_{1},Av_{2},\cdot\cdot\cdot,Av_{r},0,\cdot\cdot\cdot,0) = (\delta_{1}u_{1},\delta_{2}u_{2},\cdot\cdot\cdot,\delta_{r}u_{r},0,\cdot\cdot\cdot,0) = U\Lambda AV=A(v1,v2,⋅⋅⋅,vn)=(Av1,Av2,⋅⋅⋅,Avr,0,⋅⋅⋅,0)=(δ1u1,δ2u2,⋅⋅⋅,δrur,0,⋅⋅⋅,0)=UΛ
由此,可以得到奇异值分解的形式
A = U Λ V T A= U\Lambda V^{T} A=UΛVT
5.2 奇异值分解的求解
我们现在已经知道了奇异值分解的具体形式,那么奇异值和奇异向量到底怎样求解呢?
5.2.1奇异值的计算
对于较小维度的矩阵,我们可以从奇异值分解的推导中看出,奇异值 δ i = λ i \delta_{i} = \sqrt{\lambda_{i}} δi=λi 。于是可以通过求解原矩阵的转置与其自身相乘得到的矩阵的特征值,再对该特征值求平方根的方法求得矩阵的奇异值
在奇异值分解中,有一个十分重要的推论,那就是在式 A = U Λ V T A= U\Lambda V^{T} A=UΛVT里,U的列向量为 A A T AA^{T} AAT的特征向量,V的列向量为 A T A A^{T}A ATA的特征向量。知道这一推论,我们在计算出特征值之后就可以较为方便的求出矩阵的特征向量
A = δ 1 u 1 v 1 T + δ 2 u 2 v 2 T + ⋅ ⋅ ⋅ + δ r u r v r T A = \delta_{1}u_{1}v^{T}_{1} + \delta_{2}u_{2}v^{T}_{2} + \cdot\cdot\cdot + \delta_{r}u_{r}v^{T}_{r} A=δ1u1v1T+δ2u2v2T+⋅⋅⋅+δrurvrT
我们知道,矩阵的奇异值一般按照降序排列即
λ 1 > = λ 2 > = ⋅ ⋅ ⋅ > = λ m i n ( n , m ) > 0 \lambda_{1} >= \lambda_{2} >= \cdot\cdot\cdot >= \lambda_{min(n,m)} >0 λ1>=λ2>=⋅⋅⋅>=λmin(n,m)>0
A = [ u 1 u 2 ] [ 3 0 0 1 ] [ v 1 T v 2 T ] A = \begin{bmatrix}u_{1} & u_{2} \end{bmatrix}\begin{bmatrix}3 &0 \\ 0 & 1\end{bmatrix}\begin{bmatrix}v^{T}_{1}\\ v^{T}_{2}\end{bmatrix} A=[u1u2][3001][v1Tv2T]
其中 u 1 , u 2 , v 1 , v 2 u_{1},u_{2},v_{1},v_{2} u1,u2,v1,v2 是二维平面的向量,根据奇异值分解的性质, u 1 , u 2 u_{1},u_{2} u1,u2 线性无关, v 1 , v 2 v_{1},v_{2} v1,v2线性无关。那么对二维平面上任意的向量 x x x,都可以表示为: x = a 1 v 1 + a 2 v 2 x = a_{1}v_{1} + a_{2}v_{2} x=a1v1+a2v2,当A作用在 x x x上时
y = A x = A [ v 1 v 2 ] [ a 1 T a 2 T ] = [ u 1 u 2 ] [ 3 0 0 1 ] [ v 1 T v 2 T ] [ v 1 v 2 ] [ a 1 T a 2 T ] = 3 a 1 u 1 + a 2 u 2 y = Ax = A\begin{bmatrix}v_{1} & v_{2} \end{bmatrix} \begin{bmatrix}a^{T}_{1}\\ a^{T}_{2}\end{bmatrix} = \begin{bmatrix}u_{1} & u_{2} \end{bmatrix}\begin{bmatrix}3 &0 \\ 0 & 1\end{bmatrix}\begin{bmatrix}v^{T}_{1}\\ v^{T}_{2}\end{bmatrix}\begin{bmatrix}v_{1} & v_{2} \end{bmatrix} \begin{bmatrix}a^{T}_{1}\\ a^{T}_{2}\end{bmatrix} = 3a_{1}u_{1} + a_{2}u_{2} y=Ax=A[v1v2][a1Ta2T]=[u1u2][3001][v1Tv2T][v1v2][a1Ta2T]=3a1u1+a2u2
令 η 1 = 3 a 1 , η 2 = a 2 \eta_1=3a_1,~\eta_2=a_2 η1=3a1,η2=a2,我们可以得出结论:如果 x x x是在单位圆 a i 1 2 + a i 2 2 = 1 ai_1^2+ai_2^2=1 ai12+ai22=1上,那么 y y y正好在椭圆 η 1 2 / 3 2 + η 2 2 / 1 2 = 1 \eta_1^2/3^2+\eta_2^2/1^2=1 η12/32+η22/12=1上。这表明:矩阵A将二维平面中单位圆变换成椭圆,而两个奇异值正好是椭圆的两个半轴长,长轴所在的直线是 s p a n { u 1 } {\rm span}\{u_1\} span{ u1},短轴所在的直线是 s p a n { u 2 } {\rm span}\{u_2\} span{ u2}
推广到一般情形:一般矩阵A将单位球 ∥ x ∥ 2 = 1 \|x\|_2=1 ∥x∥2=1变换为超椭球面 E m = { y ∈ F m : y = A x , x ∈ F n , ∥ x ∥ 2 = 1 } E_m=\{y\in {\bf F}^m:~y=Ax,~x\in{\bf F}^n,~\|x\|_2=1\} Em={ y∈Fm:y=Ax,x∈Fn,∥x∥2=1},那么矩阵A的每个奇异值恰好就是超椭球的每条半轴长度。
参考文献:
特征值分解部分 [1] http://www.cnblogs.com/LeftNotEasy/archive/2011/01/19/svd-and-applications.html [2] http://blog.csdn.net/jinshengtao/article/details/18448355 [3] https://wenku.baidu.com/view/3ec0a4ddaeaad1f346933f42.html [4] http://www.doc88.com/p-9009713157157.html [5] https://wenku.baidu.com/view/f14c18215901020207409c97.html [6] https://www.zhihu.com/question/22548386 [7] https://github.com/LiangjunFeng/Machine-Learning/blob/master/8.PCA.py [8] Bhushan Datta K. Linear system theory and design, by Chi‐Tsong Chen, Oxford University Press, New York, 1999, 334 pages, ISBN 0‐19‐511777‐8[J]. International Journal of Robust & Nonlinear Control, 2015, 10(15):1360-1362.