Useful Matrix Formulas
Overview
This article organizes useful formulas for matrix computations, referencing the “Matrix Cookbook.” Only the formulas I personally found necessary are extracted, so this does not cover all possible formulas.
The original paper can be accessed at the following link:
Here, I summarize formulas that I consider particularly important. The formulas for matrix computations are categorized by topic, making them very convenient.
Basic Concepts of Matrix Computations
Matrix computations are at the core of machine learning.
They are widely used in numerical computations and deep learning.
Let the input vector be $\mathbf{x}$, and consider a linear transformation $\mathbf{A}$.
A simple transformation like $\mathbf{y} = \mathbf{A}\mathbf{x}$ is frequently used.
Here, $\mathbf{A}$ can be regarded as a mapping from $\mathbb{R}^{m \times n}$ to $\mathbb{R}^{m \times n}$.
Basic operations include element-wise addition, scalar multiplication, transposition, inversion, and eigenvalue problems. Matrix differentiation, decomposition, and tensor computations are also crucial.
For large-scale matrix computations, numerical stability and computational cost become issues. Using Python libraries like NumPy and SciPy can be effective during implementation.
Below, I list useful formulas referenced from the Matrix Cookbook.
Transpose and Adjoint Matrices
$$ \begin{aligned} (\mathbf{A B})^{-1} & = \mathbf{B}^{-1} \mathbf{A}^{-1} \\ (\mathbf{A B C} \ldots)^{-1} & = \ldots \mathbf{C}^{-1} \mathbf{B}^{-1} \mathbf{A}^{-1} \\ \left(\mathbf{A}^T\right)^{-1} & = \left(\mathbf{A}^{-1}\right)^T \\ (\mathbf{A}+\mathbf{B})^T & = \mathbf{A}^T + \mathbf{B}^T \\ (\mathbf{A B})^T & = \mathbf{B}^T \mathbf{A}^T \\ (\mathbf{A B C} \ldots)^T & = \ldots \mathbf{C}^T \mathbf{B}^T \mathbf{A}^T \\ \left(\mathbf{A}^H\right)^{-1} & = \left(\mathbf{A}^{-1}\right)^H \\ (\mathbf{A}+\mathbf{B})^H & = \mathbf{A}^H + \mathbf{B}^H \\ (\mathbf{A B})^H & = \mathbf{B}^H \mathbf{A}^H \\ (\mathbf{A B C} \ldots)^H & = \ldots \mathbf{C}^H \mathbf{B}^H \mathbf{A}^H \end{aligned} $$
Trace and Basic Properties
$$ \begin{aligned} \operatorname{Tr}(\mathbf{A}) & = \sum_i A_{i i} \\ \operatorname{Tr}(\mathbf{A}) & = \sum_i \lambda_i, \quad \lambda_i = \operatorname{eig}(\mathbf{A}) \\ \operatorname{Tr}(\mathbf{A}) & = \operatorname{Tr}\left(\mathbf{A}^T\right) \\ \operatorname{Tr}(\mathbf{A B}) & = \operatorname{Tr}(\mathbf{B} \mathbf{A}) \\ \operatorname{Tr}(\mathbf{A} + \mathbf{B}) & = \operatorname{Tr}(\mathbf{A}) + \operatorname{Tr}(\mathbf{B}) \\ \operatorname{Tr}(\mathbf{A B C}) & = \operatorname{Tr}(\mathbf{B C A}) = \operatorname{Tr}(\mathbf{C A B}) \end{aligned} $$
Determinants
Basic Determinant Formulas
$$ \begin{aligned} \operatorname{det}(\mathbf{A}) & = \prod_i \lambda_i, \quad \lambda_i = \operatorname{eig}(\mathbf{A}) \\ \operatorname{det}(c \mathbf{A}) & = c^n \operatorname{det}(\mathbf{A}), \quad \text{if } \mathbf{A} \in \mathbb{R}^{n \times n} \\ \operatorname{det}\left(\mathbf{A}^T\right) & = \operatorname{det}(\mathbf{A}) \\ \operatorname{det}(\mathbf{A B}) & = \operatorname{det}(\mathbf{A}) \operatorname{det}(\mathbf{B}) \\ \operatorname{det}\left(\mathbf{A}^{-1}\right) & = 1 / \operatorname{det}(\mathbf{A}) \\ \operatorname{det}\left(\mathbf{A}^n\right) & = \operatorname{det}(\mathbf{A})^n \\ \operatorname{det}\left(\mathbf{I} + \mathbf{u v}^T\right) & = 1 + \mathbf{u}^T \mathbf{v} \end{aligned} $$
For $n=2$:
$$ \operatorname{det}(\mathbf{I} + \mathbf{A}) = 1 + \operatorname{det}(\mathbf{A}) + \operatorname{Tr}(\mathbf{A}) $$
For $n=3$:
$$ \operatorname{det}(\mathbf{I} + \mathbf{A}) = 1 + \operatorname{det}(\mathbf{A}) + \operatorname{Tr}(\mathbf{A}) + \frac{1}{2} \operatorname{Tr}(\mathbf{A})^2 - \frac{1}{2} \operatorname{Tr}\left(\mathbf{A}^2\right) $$
For $n=4$:
$$ \begin{aligned} \operatorname{det}(\mathbf{I} + \mathbf{A}) = & 1 + \operatorname{det}(\mathbf{A}) + \operatorname{Tr}(\mathbf{A}) \\ & + \frac{1}{2} \operatorname{Tr}(\mathbf{A})^2 - \frac{1}{2} \operatorname{Tr}\left(\mathbf{A}^2\right) \\ & + \frac{1}{6} \operatorname{Tr}(\mathbf{A})^3 - \frac{1}{2} \operatorname{Tr}(\mathbf{A}) \operatorname{Tr}\left(\mathbf{A}^2\right) + \frac{1}{3} \operatorname{Tr}\left(\mathbf{A}^3\right) \end{aligned} $$
For small $\varepsilon$:
$$ \operatorname{det}(\mathbf{I} + \varepsilon \mathbf{A}) \cong 1 + \operatorname{det}(\mathbf{A}) + \varepsilon \operatorname{Tr}(\mathbf{A}) + \frac{1}{2} \varepsilon^2 \operatorname{Tr}(\mathbf{A})^2 - \frac{1}{2} \varepsilon^2 \operatorname{Tr}\left(\mathbf{A}^2\right) $$
Differentiation
$$ \frac{\partial \mathbf{Y}^{-1}}{\partial x} = -\mathbf{Y}^{-1} \frac{\partial \mathbf{Y}}{\partial x} \mathbf{Y}^{-1} $$
Proof Notes
Use the property that $\mathbf{Y} \mathbf{Y}^{-1} = \mathbf{I}$ (identity matrix).
Differentiating both sides with respect to $x$ yields: $$ \frac{\partial}{\partial x} (\mathbf{Y} \mathbf{Y}^{-1}) = \frac{\partial \mathbf{I}}{\partial x} = 0 $$
Expand the left-hand side: $$ \frac{\partial \mathbf{Y}}{\partial x} \mathbf{Y}^{-1} + \mathbf{Y} \frac{\partial \mathbf{Y}^{-1}}{\partial x} = 0 $$
Simplify using the inverse of $\mathbf{Y}$ by multiplying both sides on the left by $\mathbf{Y}^{-1}$: $$ \mathbf{Y}^{-1} \frac{\partial \mathbf{Y}}{\partial x} \mathbf{Y}^{-1} + \frac{\partial \mathbf{Y}^{-1}}{\partial x} = 0 $$
Finally, solve for $\frac{\partial \mathbf{Y}^{-1}}{\partial x}$: $$ \frac{\partial \mathbf{Y}^{-1}}{\partial x} = -\mathbf{Y}^{-1} \frac{\partial \mathbf{Y}}{\partial x} \mathbf{Y}^{-1} $$
Inverse Matrices
Woodbury Formula
$$ \begin{aligned} \left(\mathbf{A} + \mathbf{C B C}^T\right)^{-1} & = \mathbf{A}^{-1} - \mathbf{A}^{-1} \mathbf{C} \left(\mathbf{B}^{-1} + \mathbf{C}^T \mathbf{A}^{-1} \mathbf{C}\right)^{-1} \mathbf{C}^T \mathbf{A}^{-1} \\ (\mathbf{A} + \mathbf{U B V})^{-1} & = \mathbf{A}^{-1} - \mathbf{A}^{-1} \mathbf{U} \left(\mathbf{B}^{-1} + \mathbf{V A}^{-1} \mathbf{U}\right)^{-1} \mathbf{V} \mathbf{A}^{-1} \end{aligned} $$
When $\mathbf{P}$ and $\mathbf{R}$ Are Positive Definite Matrices
The following formula is used in applications such as Gaussian conditional distributions and Kalman filters:
$$ \left(\mathbf{P}^{-1} + \mathbf{B}^T \mathbf{R}^{-1} \mathbf{B}\right)^{-1} \mathbf{B}^T \mathbf{R}^{-1} = \mathbf{P B}^T \left(\mathbf{B P B}^T + \mathbf{R}\right)^{-1} $$
Approximations
These formulas were frequently used during paper writing.
$$ (\mathbf{I} - \mathbf{A})^{-1} = \sum_{n=0}^\infty \mathbf{A}^n $$
$$ (\mathbf{I} + \mathbf{A})^{-1} = \sum_{n=0}^\infty (-1)^n \mathbf{A}^n $$
When All Eigenvalues Have Absolute Values Less Than 1
$$ \begin{aligned} (\mathbf{I} - \mathbf{A})^{-1} & \cong \mathbf{I} + \mathbf{A} + \mathbf{A}^2 \\ (\mathbf{I} + \mathbf{A})^{-1} & \cong \mathbf{I} - \mathbf{A} + \mathbf{A}^2 \end{aligned} $$
When $\mathbf{A}$ Is Very Large
$$ \mathbf{A} - \mathbf{A} (\mathbf{I} + \mathbf{A})^{-1} \mathbf{A} \cong \mathbf{I} - \mathbf{A}^{-1} $$
Exponentials
These formulas were also quite useful in past research.
$$ \begin{aligned} e^{\mathbf{A}} & \equiv \sum_{n=0}^\infty \frac{1}{n!} \mathbf{A}^n = \mathbf{I} + \mathbf{A} + \frac{1}{2} \mathbf{A}^2 + \ldots \\ e^{-\mathbf{A}} & \equiv \sum_{n=0}^\infty \frac{1}{n!} (-1)^n \mathbf{A}^n = \mathbf{I} - \mathbf{A} + \frac{1}{2} \mathbf{A}^2 - \ldots \\ e^{t \mathbf{A}} & \equiv \sum_{n=0}^\infty \frac{1}{n!} (t \mathbf{A})^n = \mathbf{I} + t \mathbf{A} + \frac{1}{2} t^2 \mathbf{A}^2 + \ldots \\ \ln (\mathbf{I} + \mathbf{A}) & \equiv \sum_{n=1}^\infty \frac{(-1)^{n-1}}{n} \mathbf{A}^n = \mathbf{A} - \frac{1}{2} \mathbf{A}^2 + \frac{1}{3} \mathbf{A}^3 - \ldots \end{aligned} $$
Conclusion
The Matrix Cookbook contains many more formulas beyond those listed here. I plan to add additional formulas as needed in the future.