Useful Matrix Formulas

Overview

This article organizes useful formulas for matrix computations, referencing the “Matrix Cookbook.” Only the formulas I personally found necessary are extracted, so this does not cover all possible formulas.

The original paper can be accessed at the following link:

Here, I summarize formulas that I consider particularly important. The formulas for matrix computations are categorized by topic, making them very convenient.


Basic Concepts of Matrix Computations

Matrix computations are at the core of machine learning.

They are widely used in numerical computations and deep learning.

Let the input vector be $\mathbf{x}$, and consider a linear transformation $\mathbf{A}$.

A simple transformation like $\mathbf{y} = \mathbf{A}\mathbf{x}$ is frequently used.

Here, $\mathbf{A}$ can be regarded as a mapping from $\mathbb{R}^{m \times n}$ to $\mathbb{R}^{m \times n}$.

Basic operations include element-wise addition, scalar multiplication, transposition, inversion, and eigenvalue problems. Matrix differentiation, decomposition, and tensor computations are also crucial.

For large-scale matrix computations, numerical stability and computational cost become issues. Using Python libraries like NumPy and SciPy can be effective during implementation.

Below, I list useful formulas referenced from the Matrix Cookbook.


Transpose and Adjoint Matrices

$$ \begin{aligned} (\mathbf{A B})^{-1} & = \mathbf{B}^{-1} \mathbf{A}^{-1} \\ (\mathbf{A B C} \ldots)^{-1} & = \ldots \mathbf{C}^{-1} \mathbf{B}^{-1} \mathbf{A}^{-1} \\ \left(\mathbf{A}^T\right)^{-1} & = \left(\mathbf{A}^{-1}\right)^T \\ (\mathbf{A}+\mathbf{B})^T & = \mathbf{A}^T + \mathbf{B}^T \\ (\mathbf{A B})^T & = \mathbf{B}^T \mathbf{A}^T \\ (\mathbf{A B C} \ldots)^T & = \ldots \mathbf{C}^T \mathbf{B}^T \mathbf{A}^T \\ \left(\mathbf{A}^H\right)^{-1} & = \left(\mathbf{A}^{-1}\right)^H \\ (\mathbf{A}+\mathbf{B})^H & = \mathbf{A}^H + \mathbf{B}^H \\ (\mathbf{A B})^H & = \mathbf{B}^H \mathbf{A}^H \\ (\mathbf{A B C} \ldots)^H & = \ldots \mathbf{C}^H \mathbf{B}^H \mathbf{A}^H \end{aligned} $$


Trace and Basic Properties

$$ \begin{aligned} \operatorname{Tr}(\mathbf{A}) & = \sum_i A_{i i} \\ \operatorname{Tr}(\mathbf{A}) & = \sum_i \lambda_i, \quad \lambda_i = \operatorname{eig}(\mathbf{A}) \\ \operatorname{Tr}(\mathbf{A}) & = \operatorname{Tr}\left(\mathbf{A}^T\right) \\ \operatorname{Tr}(\mathbf{A B}) & = \operatorname{Tr}(\mathbf{B} \mathbf{A}) \\ \operatorname{Tr}(\mathbf{A} + \mathbf{B}) & = \operatorname{Tr}(\mathbf{A}) + \operatorname{Tr}(\mathbf{B}) \\ \operatorname{Tr}(\mathbf{A B C}) & = \operatorname{Tr}(\mathbf{B C A}) = \operatorname{Tr}(\mathbf{C A B}) \end{aligned} $$


Determinants

Basic Determinant Formulas

$$ \begin{aligned} \operatorname{det}(\mathbf{A}) & = \prod_i \lambda_i, \quad \lambda_i = \operatorname{eig}(\mathbf{A}) \\ \operatorname{det}(c \mathbf{A}) & = c^n \operatorname{det}(\mathbf{A}), \quad \text{if } \mathbf{A} \in \mathbb{R}^{n \times n} \\ \operatorname{det}\left(\mathbf{A}^T\right) & = \operatorname{det}(\mathbf{A}) \\ \operatorname{det}(\mathbf{A B}) & = \operatorname{det}(\mathbf{A}) \operatorname{det}(\mathbf{B}) \\ \operatorname{det}\left(\mathbf{A}^{-1}\right) & = 1 / \operatorname{det}(\mathbf{A}) \\ \operatorname{det}\left(\mathbf{A}^n\right) & = \operatorname{det}(\mathbf{A})^n \\ \operatorname{det}\left(\mathbf{I} + \mathbf{u v}^T\right) & = 1 + \mathbf{u}^T \mathbf{v} \end{aligned} $$

For $n=2$:

$$ \operatorname{det}(\mathbf{I} + \mathbf{A}) = 1 + \operatorname{det}(\mathbf{A}) + \operatorname{Tr}(\mathbf{A}) $$

For $n=3$:

$$ \operatorname{det}(\mathbf{I} + \mathbf{A}) = 1 + \operatorname{det}(\mathbf{A}) + \operatorname{Tr}(\mathbf{A}) + \frac{1}{2} \operatorname{Tr}(\mathbf{A})^2 - \frac{1}{2} \operatorname{Tr}\left(\mathbf{A}^2\right) $$

For $n=4$:

$$ \begin{aligned} \operatorname{det}(\mathbf{I} + \mathbf{A}) = & 1 + \operatorname{det}(\mathbf{A}) + \operatorname{Tr}(\mathbf{A}) \\ & + \frac{1}{2} \operatorname{Tr}(\mathbf{A})^2 - \frac{1}{2} \operatorname{Tr}\left(\mathbf{A}^2\right) \\ & + \frac{1}{6} \operatorname{Tr}(\mathbf{A})^3 - \frac{1}{2} \operatorname{Tr}(\mathbf{A}) \operatorname{Tr}\left(\mathbf{A}^2\right) + \frac{1}{3} \operatorname{Tr}\left(\mathbf{A}^3\right) \end{aligned} $$

For small $\varepsilon$:

$$ \operatorname{det}(\mathbf{I} + \varepsilon \mathbf{A}) \cong 1 + \operatorname{det}(\mathbf{A}) + \varepsilon \operatorname{Tr}(\mathbf{A}) + \frac{1}{2} \varepsilon^2 \operatorname{Tr}(\mathbf{A})^2 - \frac{1}{2} \varepsilon^2 \operatorname{Tr}\left(\mathbf{A}^2\right) $$


Differentiation

$$ \frac{\partial \mathbf{Y}^{-1}}{\partial x} = -\mathbf{Y}^{-1} \frac{\partial \mathbf{Y}}{\partial x} \mathbf{Y}^{-1} $$

Proof Notes

  1. Use the property that $\mathbf{Y} \mathbf{Y}^{-1} = \mathbf{I}$ (identity matrix).

    Differentiating both sides with respect to $x$ yields: $$ \frac{\partial}{\partial x} (\mathbf{Y} \mathbf{Y}^{-1}) = \frac{\partial \mathbf{I}}{\partial x} = 0 $$

  2. Expand the left-hand side: $$ \frac{\partial \mathbf{Y}}{\partial x} \mathbf{Y}^{-1} + \mathbf{Y} \frac{\partial \mathbf{Y}^{-1}}{\partial x} = 0 $$

  3. Simplify using the inverse of $\mathbf{Y}$ by multiplying both sides on the left by $\mathbf{Y}^{-1}$: $$ \mathbf{Y}^{-1} \frac{\partial \mathbf{Y}}{\partial x} \mathbf{Y}^{-1} + \frac{\partial \mathbf{Y}^{-1}}{\partial x} = 0 $$

  4. Finally, solve for $\frac{\partial \mathbf{Y}^{-1}}{\partial x}$: $$ \frac{\partial \mathbf{Y}^{-1}}{\partial x} = -\mathbf{Y}^{-1} \frac{\partial \mathbf{Y}}{\partial x} \mathbf{Y}^{-1} $$


Inverse Matrices

Woodbury Formula

$$ \begin{aligned} \left(\mathbf{A} + \mathbf{C B C}^T\right)^{-1} & = \mathbf{A}^{-1} - \mathbf{A}^{-1} \mathbf{C} \left(\mathbf{B}^{-1} + \mathbf{C}^T \mathbf{A}^{-1} \mathbf{C}\right)^{-1} \mathbf{C}^T \mathbf{A}^{-1} \\ (\mathbf{A} + \mathbf{U B V})^{-1} & = \mathbf{A}^{-1} - \mathbf{A}^{-1} \mathbf{U} \left(\mathbf{B}^{-1} + \mathbf{V A}^{-1} \mathbf{U}\right)^{-1} \mathbf{V} \mathbf{A}^{-1} \end{aligned} $$

When $\mathbf{P}$ and $\mathbf{R}$ Are Positive Definite Matrices

The following formula is used in applications such as Gaussian conditional distributions and Kalman filters:

$$ \left(\mathbf{P}^{-1} + \mathbf{B}^T \mathbf{R}^{-1} \mathbf{B}\right)^{-1} \mathbf{B}^T \mathbf{R}^{-1} = \mathbf{P B}^T \left(\mathbf{B P B}^T + \mathbf{R}\right)^{-1} $$


Approximations

These formulas were frequently used during paper writing.

$$ (\mathbf{I} - \mathbf{A})^{-1} = \sum_{n=0}^\infty \mathbf{A}^n $$

$$ (\mathbf{I} + \mathbf{A})^{-1} = \sum_{n=0}^\infty (-1)^n \mathbf{A}^n $$

When All Eigenvalues Have Absolute Values Less Than 1

$$ \begin{aligned} (\mathbf{I} - \mathbf{A})^{-1} & \cong \mathbf{I} + \mathbf{A} + \mathbf{A}^2 \\ (\mathbf{I} + \mathbf{A})^{-1} & \cong \mathbf{I} - \mathbf{A} + \mathbf{A}^2 \end{aligned} $$

When $\mathbf{A}$ Is Very Large

$$ \mathbf{A} - \mathbf{A} (\mathbf{I} + \mathbf{A})^{-1} \mathbf{A} \cong \mathbf{I} - \mathbf{A}^{-1} $$


Exponentials

These formulas were also quite useful in past research.

$$ \begin{aligned} e^{\mathbf{A}} & \equiv \sum_{n=0}^\infty \frac{1}{n!} \mathbf{A}^n = \mathbf{I} + \mathbf{A} + \frac{1}{2} \mathbf{A}^2 + \ldots \\ e^{-\mathbf{A}} & \equiv \sum_{n=0}^\infty \frac{1}{n!} (-1)^n \mathbf{A}^n = \mathbf{I} - \mathbf{A} + \frac{1}{2} \mathbf{A}^2 - \ldots \\ e^{t \mathbf{A}} & \equiv \sum_{n=0}^\infty \frac{1}{n!} (t \mathbf{A})^n = \mathbf{I} + t \mathbf{A} + \frac{1}{2} t^2 \mathbf{A}^2 + \ldots \\ \ln (\mathbf{I} + \mathbf{A}) & \equiv \sum_{n=1}^\infty \frac{(-1)^{n-1}}{n} \mathbf{A}^n = \mathbf{A} - \frac{1}{2} \mathbf{A}^2 + \frac{1}{3} \mathbf{A}^3 - \ldots \end{aligned} $$


Conclusion

The Matrix Cookbook contains many more formulas beyond those listed here. I plan to add additional formulas as needed in the future.