Skip to content

Differentiation on Matrix Manifold

Author: Dr. Jack Yansong Li
Affiliation: Liii Network
Email: yansong@liii.pro

Question

How to calculate the derivative of f(X)=logdet(X)f(X) = \log \det(X), where XSnX \in \mathbb{S}^n is a nn-dimensional positive-definite symmetrical real-valued matrix. Or g(x)=xAxg(x) = x^{\top} A x, where xRnx \in \mathbb{R}^n and ASnA \in \mathbb{S}^n.

Definitions

Scalar field: A mapping ψ:MR\psi : \mathcal{M} \rightarrow \mathbb{R}. The set of all scalar fields on M\mathcal{M} is denoted by FM\mathcal{F}_{\mathcal{M}}.

Example: The functions ff and gg in the question are scalar fields.

Taylor expansion for a scalar field ff:

f(x+Δx)=f(x)+Df(x)[Δx]+12D2f(x)[Δx,Δx]+h.o.t,f(x + \Delta x) = f(x) + \mathrm{D} f(x) [\Delta x] + \frac{1}{2} \mathrm{D}^2 f(x) [\Delta x, \Delta x] + \text{h.o.t},

where Df\mathrm{D} f maps ML(M,R)\mathcal{M} \rightarrow \mathcal{L}(\mathcal{M}, \mathbb{R}) and D2f\mathrm{D}^2 f maps MBL(M,R)\mathcal{M} \rightarrow \operatorname{BL}(\mathcal{M}, \mathbb{R}).

Notation:

  • L(M,R)\mathcal{L}(\mathcal{M}, \mathbb{R}): set of linear maps MR\mathcal{M} \to \mathbb{R}
  • BL(M,R)\operatorname{BL}(\mathcal{M}, \mathbb{R}): set of bilinear maps M×MR\mathcal{M} \times \mathcal{M} \to \mathbb{R}

Remark: D2f=D(Df)\mathrm{D}^2 f = \mathrm{D}(\mathrm{D} f) and L(M,L(M,R))BL(M,R)\mathcal{L}(\mathcal{M}, \mathcal{L}(\mathcal{M}, \mathbb{R})) \cong \operatorname{BL}(\mathcal{M}, \mathbb{R}).

Vector at xMx \in \mathcal{M}: a linear map v:FMRv : \mathcal{F}_{\mathcal{M}} \rightarrow \mathbb{R}.

Example: The perturbation Δx\Delta x acts as a vector at xx via:

Δx(f)Df(x)[Δx].\Delta x(f) \triangleq \mathrm{D} f(x) [\Delta x].

Derivative of g(x)=xAxg(x) = x^{\top} A x

Expand:

g(x+Δx)=(x+Δx)A(x+Δx)=xAx+xAΔx+ΔxAx+h.o.t.g(x + \Delta x) = (x + \Delta x)^{\top} A (x + \Delta x) = x^{\top} A x + x^{\top} A \Delta x + \Delta x^{\top} A x + \text{h.o.t}.

Using symmetry of AA (ΔxAx=xAΔx\Delta x^{\top} A x = x^{\top} A \Delta x):

g(x+Δx)g(x)+2xAΔx.g(x + \Delta x) \approx g(x) + 2 x^{\top} A \Delta x.

By definition of Taylor expansion:

Dg(x)[Δx]=2xAΔx.(1)\mathrm{D} g(x) [\Delta x] = 2 x^{\top} A \Delta x. \tag{1}

Dg(x)L(Rn,R)\mathrm{D} g(x) \in \mathcal{L}(\mathbb{R}^n, \mathbb{R}). By the Riesz Representation Theorem, there exists a gradient g(x)\nabla g(x) such that:

Dg(x)[Δx]=g(x),ΔxRn.\mathrm{D} g(x) [\Delta x] = \langle \nabla g(x), \Delta x \rangle_{\mathbb{R}^n}.

Comparing with (1) gives g(x),Δx=2xAΔx\langle \nabla g(x), \Delta x \rangle = 2 x^{\top} A \Delta x, so:

g(x)=2Ax.\boxed{\nabla g(x) = 2 A x}.

Derivative of f(X)=logdet(X)f(X) = \log \det(X)

Expand:

f(X+ΔX)=logdet(X+ΔX).f(X + \Delta X) = \log \det(X + \Delta X).

Factor X=X1/2X1/2X = X^{1/2} X^{1/2}:

f(X+ΔX)=logdet(X1/2(I+X1/2ΔXX1/2)X1/2).f(X + \Delta X) = \log \det(X^{1/2} (I + X^{-1/2} \Delta X X^{-1/2}) X^{1/2}).

Using det(AB)=det(A)det(B)\det(AB) = \det(A)\det(B):

=logdet(X)+logdet(I+X1/2ΔXX1/2).= \log \det(X) + \log \det(I + X^{-1/2} \Delta X X^{-1/2}).

Let λi\lambda_i be eigenvalues of X1/2ΔXX1/2X^{-1/2} \Delta X X^{-1/2}. Then:

f(X+ΔX)=logdet(X)+i=1nlog(1+λi).f(X + \Delta X) = \log \det(X) + \sum_{i=1}^n \log(1 + \lambda_i).

For small λi\lambda_i, log(1+λi)λi\log(1 + \lambda_i) \approx \lambda_i, so:

f(X+ΔX)logdet(X)+i=1nλi.f(X + \Delta X) \approx \log \det(X) + \sum_{i=1}^n \lambda_i.

Since λi=tr(X1/2ΔXX1/2)\sum \lambda_i = \operatorname{tr}(X^{-1/2} \Delta X X^{-1/2}) and tr(AB)=tr(BA)\operatorname{tr}(AB) = \operatorname{tr}(BA):

λi=tr(X1ΔX).\sum \lambda_i = \operatorname{tr}(X^{-1} \Delta X).

Thus:

f(X+ΔX)f(X)+tr(X1ΔX).f(X + \Delta X) \approx f(X) + \operatorname{tr}(X^{-1} \Delta X).

Hence:

Df(X)[ΔX]=tr(X1ΔX).\mathrm{D} f(X) [\Delta X] = \operatorname{tr}(X^{-1} \Delta X).

On Sn\mathbb{S}^n, the inner product is A,B=tr(AB)\langle A, B \rangle = \operatorname{tr}(A^{\top} B). Since X1X^{-1} is symmetric:

tr(X1ΔX)=X1,ΔXSn.\operatorname{tr}(X^{-1} \Delta X) = \langle X^{-1}, \Delta X \rangle_{\mathbb{S}^n}.

Therefore:

f(X)=X1.\boxed{\nabla f(X) = X^{-1}}.

Useful Identities

  1. det(AB)=det(A)det(B)\det(AB) = \det(A)\det(B)
  2. tr(AB)=tr(BA)\operatorname{tr}(AB) = \operatorname{tr}(BA)
  3. tr(X)=i=1nλi\operatorname{tr}(X) = \sum_{i=1}^n \lambda_i, where λi\lambda_i are eigenvalues of XX.

Exercise

Calculate the second derivative operator of f(X)=logdet(X)f(X) = \log \det(X) using the expansion method.

Hint 1: Treat Df()[ΔX]\mathrm{D} f(\cdot)[\Delta X] as a scalar field and expand Df(X+δX)[ΔX]\mathrm{D} f(X + \delta X)[\Delta X].

Hint 2: For small AA, (I+A)1IA(I + A)^{-1} \approx I - A.

Hint 3: The representation of D2f\mathrm{D}^2 f is a tensor, not necessarily a matrix.


下载题目文件

📥 下载《001_matrix_calculus.tmu》题目文件

提交要求:

  1. 将答案写在《001_matrix_calculus.tmu》文件末尾
  2. 重命名为:001_你的姓名_学校.tmu
  3. 发送至:yansong@liii.pro
  4. 截止时间:本周日 23:59

奖品: Liii STEM定制文化衫

参与方式:

  1. 加入QQ群:934456971 获取前置资料
  2. 下载题目文件并仔细阅读
  3. 完成题目并按要求提交

← 返回每周一题活动主页

Last updated: