General post

Linear algebra basics

Originally posted 2025-07-30

Last updated 2025-07-31

Notation

I’ll try to be consistent with notation on the site. In the context of linear algebra, the following table catpures the intended conventions. Most of these are widely used elsewhere:

ItemDescription of notationExample
Scalar variablelower case, unboldedcc
Vector variablelower case, boldx\mathbf{x}
Matrix variableupper case, boldA\mathbf{A}
Productimplicit multiplication (no dot or “times” symbol)Ax\mathbf{Ax}
SumAppears as normal additionx+b\mathbf{x} + \mathbf{b}
Hadamard productcircle-dotAB\mathbf{A} \odot \mathbf{B}
Matrix transpositionsuperscript TAT\mathbf{A}^{\mathsf{T}}
Matrix Hermitian conjugate (adjoint)superscript daggerA\mathbf{A}^\dagger
Matrix powersuperscript powerAn\mathbf{A}^n
Multiplicative matrix inverse1-1st powerA1\mathbf{A}^{-1}
Identity matrix (dimensions are context-dependent)capital “I”I\mathbf{I}
nn-ary identity matrixas above, but with subscriptIn\mathbf{I}_n
Vector norm (magnitude)absolute value barsx\vert\mathbf{x}\vert
Vector norm (magnitude), alternativesame variable name but rendered as a scalarxx
LpL^p normdouble bars with subscriptxp\Vert \mathbf{x} \Vert_p
Frobenius normdouble bars with subscript “F”AF\Vert \mathbf{A} \Vert_F
Complex conjugateoverbarc\overline{c}

Parentheses are avoided where possible because multiplication is associative.

Finally, “index notation” makes many calculations clearer and is very compact. For a vector x\mathbf{x}, xix_i refers to the iith atomic entry in the list (essentially always 1-based). For a matrix A\mathbf{A}, AijA_{ij} refers to the atomic entry in the iith row and jjth column. This is very powerful because it can be extended to more complicated matrix definitions, for example (AB)ij(AB)_{ij} refers to the atom in the iith row and jjth column in the product (output) matrix AB\mathbf{AB}.

Note that index notation uses a non-boldfaced font since it describes the value of an atomic value rather than a matrix (or vector) per se.

Definitions

  • A scalar is any single atomic value in your algebraic field. By default, you should assume this is a complex value unless we are talking about probabilities or random variables.
  • A vector is an ordered list of atomic values (scalars). It has a single dimensional index, i.e., it is a rank-1 tensor.
  • A matrix is a two-dimensional structured rectangular container of atomic values (scalars), i.e., a rank-2 tensor. As with vectors, the order (or more accurately, absolute position) of each cell matters. We say that a matrix is (or has dimensions) m×nm \times n if it has mm rows and nn columns. Just as in index notation, the row comes first.

When rendering a matrix, we use parenthetical brackets consistently.1

For example, here is a 3×23 \times 2 matrix rendered with its cells explicitly labeled in index notation:

A=(A11A21A21A22A31A32)\mathbf{A} = \begin{pmatrix} A_{11} & A_{21} \\ A_{21} & A_{22} \\ A_{31} & A_{32} \end{pmatrix}

Note that we don’t use a comma to separte the indices when it’s unambiguous.2

Finally, we always assume that any vector is a column vector unless otherwise stated. That is, if x\mathbf{x} is an nn-dimensional vector, then when represented as a matrix (such as for multiplication), it has dimension n×1n \times 1. The corresponding row vector is xT\mathbf{x}^\mathsf{T}, which has dimension 1×n1 \times n.

Basic operations

Transposition

To transpose a matrix, you just swap its elements index-wise. This turns an m×nm \times n matrix into an n×mn \times m matrix.

In index notation:

(AT)ij=Aji\left(A^{\mathsf{T}}\right)_{ij} = A_{ji}

And visually with an example:

(1234)T=(1324)\begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}^{\mathsf{T}} = \begin{pmatrix} 1 & 3 \\ 2 & 4 \end{pmatrix}

Note that by definition, this preserves the values along the diagonal of the matrix (all cells which can be indexed as AiiA_{ii}).

Hermitian transposition

The Hermitian transpose—sometimes called the conjugate transpose or adjoint operation3—is the same as transposition, but you also take the complex conjugate of each cell.

In index notation: (A)ij=Aji\left(A^\dagger\right)_{ij} = \overline{A_{ji}}

Addition (and subtraction)

For our purposes, you can only add a matrix (or vector) to anther matrix (or vector) with the same dimensions. It operates element-wise:

(A+B)ij=Aij+Bij(A + B)_{ij} = A_{ij} + B_{ij}

Matrix addition inherits commutativity from this definition.

Subtraction works analogously, but you take the additive inverse of each entry of the right-hand side:

(AB)ij=AijBij(A - B)_{ij} = A_{ij} - B_{ij}

Scalar-matrix product

The scalar-matrix product is the result of multiplying a scalar with a matrix. In index notation, it is defined as (cA)ij=cAij(c \mathbf{A})_{ij} = cA_{ij}. In some sense, the scalar (cc) “distributes” over the entire matrix (you mulitply element-wise). Since atomic products commute, so do scalar-matrix products. Conventionally, however, you show the scalar on the left-hand side of a product.

Dot product

The dot product, sometimes (confusingly, in this context4) referred to as the scalar product or vector-vector product, “multiplies” two vectors together and produces a scalar output.

In coordinate-free vector math, this is typically denoted by xy\mathbf{x} \cdot \mathbf{y}.

However, since we always consider vectors to be columnar matrices and use implicit multiplication, we instead use xTy\mathbf{x}^{\mathsf{T}}\mathbf{y} to show a matrix-matrix (dot) product.

This operation is defined as5 the sum of matching coordinate-wise products as shown below. Note that both x\mathbf{x} and y\mathbf{y} must have the same dimension nn:

xTy=i=1nxiyi\mathbf{x}^{\mathsf{T}}\mathbf{y} = \sum_{i=1}^n x_i y_i

Matrix-matrix product

The matrix-matrix product6 is defined to be the composite matrix you get by taking the dot product of each row on the left-hand matrix with each column on the right-hand matrix. Each such row-column pair determines a unique cell in the output. This definition implies that you can only take the product AB\mathbf{AB} if A\mathbf{A} is m×nm \times n and B\mathbf{B} is n×rn \times r, since the row vectors in A\mathbf{A} must align with the column vectors in B\mathbf{B}. The corresponding output is m×rm \times r.

Finally, in index notation:7

(AB)ij=k=1nAikBkj(AB)_{ij} = \sum_{k=1}^n A_{ik} B_{kj}

Here’s a visual example. Define A\mathbf{A} (2×32 \times 3) and B\mathbf{B} (3×23 \times 2) as

A=(A11A12A13A21A22A23),B=(B11B12B21B22B31B32)\mathbf{A} = \begin{pmatrix} \colorbox{lightblue}{$A_{11}$} & \colorbox{lightblue}{$A_{12}$} & \colorbox{lightblue}{$A_{13}$} \\ \colorbox{lightgreen}{$A_{21}$} & \colorbox{lightgreen}{$A_{22}$} & \colorbox{lightgreen}{$A_{23}$} \end{pmatrix}, \mathbf{B} = \begin{pmatrix} \colorbox{pink}{$B_{11}$} & \colorbox{lightyellow}{$B_{12}$} \\ \colorbox{pink}{$B_{21}$} & \colorbox{lightyellow}{$B_{22}$} \\ \colorbox{pink}{$B_{31}$} & \colorbox{lightyellow}{$B_{32}$} \end{pmatrix}

This gives

AB=(A11B11+A12B21+A13B31A11B12+A12B22+A13B32A21B11+A22B21+A23B31A21B12+A22B22+A23B32),\mathbf{A}\mathbf{B} = \begin{pmatrix} \colorbox{lightblue}{$A_{11}$}\colorbox{pink}{$B_{11}$} + \colorbox{lightblue}{$A_{12}$}\colorbox{pink}{$B_{21}$} + \colorbox{lightblue}{$A_{13}$}\colorbox{pink}{$B_{31}$} & \colorbox{lightblue}{$A_{11}$}\colorbox{lightyellow}{$B_{12}$} + \colorbox{lightblue}{$A_{12}$}\colorbox{lightyellow}{$B_{22}$} + \colorbox{lightblue}{$A_{13}$}\colorbox{lightyellow}{$B_{32}$} \\ \colorbox{lightgreen}{$A_{21}$}\colorbox{pink}{$B_{11}$} + \colorbox{lightgreen}{$A_{22}$}\colorbox{pink}{$B_{21}$} + \colorbox{lightgreen}{$A_{23}$}\colorbox{pink}{$B_{31}$} & \colorbox{lightgreen}{$A_{21}$}\colorbox{lightyellow}{$B_{12}$} + \colorbox{lightgreen}{$A_{22}$}\colorbox{lightyellow}{$B_{22}$} + \colorbox{lightgreen}{$A_{23}$}\colorbox{lightyellow}{$B_{32}$} \end{pmatrix},

where the final output is 2×22 \times 2.

Hadamard product

The last (and least-used8) product type is the Hadamard product, which multiplies corresponding cells element-wise. Like addition, this requires both input matrices to have the same dimensions:

(AB)ij=AijBij(A \odot B)_{ij} = A_{ij} B_{ij}

Core matrix-matrix product properties

Not commutative

The matrix-matrix product in general is not commutative. Here’s a counter-example:

(1000)(0101)=(0100)\begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix} \begin{pmatrix} 0 & 1 \\ 0 & 1 \end{pmatrix} = \begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix}

But

(0101)(1000)=(0000)\begin{pmatrix} 0 & 1 \\ 0 & 1 \end{pmatrix} \begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix} = \begin{pmatrix} 0 & 0 \\ 0 & 0 \end{pmatrix}

You can immediately tell that the results are not even simple transpositions or rotations of each other because they have entirely different values.

This is a somewhat pathological example because both matrices are singular.9 However, the operation still does not in general commute even among non-singular matrices.

Associative

Let A\mathbf{A} be m×nm \times n, B\mathbf{B} be n×rn \times r, and C\mathbf{C} be r×pr \times p.

Consider the product (AB)C(\mathbf{AB})\mathbf{C}:

((AB)C)ij=k=1n(AB)ikCkj=k=1n(=1rAiBk)Ckj=k=1n=1rAiBkCkj\begin{aligned} ((AB)C)_{ij} &= \sum_{k=1}^{n} (AB)_{ik} C_{kj} \\ &= \sum_{k=1}^{n} \left( \sum_{\ell=1}^{r} A_{i \ell} B_{\ell k} \right) C_{kj} \\ &= \sum_{k=1}^{n} \sum_{\ell=1}^{r} A_{i \ell} B_{\ell k} C_{kj} \end{aligned}

Now consider A(BC)\mathbf{A}(\mathbf{BC}):

(A(BC))ij==1rAi(BC)j==1rAi(k=1nBkCkj)==1rk=1nAiBkCkj=k=1n=1rAiBkCkj\begin{aligned} (A(BC))_{ij} &= \sum_{\ell=1}^{r} A_{i \ell} (BC)_{\ell j} \\ &= \sum_{\ell=1}^{r} A_{i \ell} \left( \sum_{k=1}^{n} B_{\ell k} C_{k j} \right) \\ &= \sum_{\ell=1}^{r} \sum_{k=1}^{n} A_{i \ell} B_{\ell k} C_{k j} \\ &= \sum_{k=1}^{n} \sum_{\ell=1}^{r} A_{i \ell} B_{\ell k} C_{k j} \end{aligned}

So (AB)C=A(BC)(\mathbf{AB})\mathbf{C} = \mathbf{A}(\mathbf{BC}). We almost always drop the parentheses.

Distributive

Consider A(B+C)\mathbf{A}(\mathbf{B} + \mathbf{C}). Suppose that A\mathbf{A} is m×nm \times n and B\mathbf{B} and C\mathbf{C} are n×rn \times r (recall that addition requires dimensions to match). In index notation, we have

(A(B+C))ij=k=1nAik(B+C)kj=k=1nAik(Bkj+Ckj)=(k=1nAikBkj)+(k=1nAikCkj)=(AB)ij+(AC)ij\begin{aligned} (A(B + C))_{ij} &= \sum_{k=1}^n A_{ik} (B + C)_{kj} \\ &= \sum_{k=1}^n A_{ik} (B_{kj} + C_{kj}) \\ &= \left(\sum_{k=1}^n A_{ik} B_{kj}\right) + \left(\sum_{k=1}^n A_{ik} C_{kj}\right) \\ &= (AB)_{ij} + (AC)_{ij} \end{aligned}

In other words, A(B+C)=AB+AC\mathbf{A}(\mathbf{B} + \mathbf{C}) = \mathbf{AB} + \mathbf{AC}. Left-multiplication distributes over matrix addition. You can similarly show that right-multiplication distributes over addition.

Be careful when applying this! Multiplication still does not commute, so if you want to factor an expression such as (AC+BC)(\mathbf{AC} + \mathbf{BC}), you must write it as (A+B)C(\mathbf{A} + \mathbf{B})\mathbf{C} rather than the unrelated C(A+B)\mathbf{C}(\mathbf{A} + \mathbf{B}). For similar reasons, you cannot factor AC+CB\mathbf{AC} + \mathbf{CB}, since the common factor C\mathbf{C} is a left-multiplicand in one term and a right-multiplicand in the other.

The identity matrix and inversion

Identity matrix

Consider the n×nn \times n (square) diagonal matrix

In=(100010001)\mathbf{I}_n = \begin{pmatrix} 1 & 0 & \dots & 0 \\ 0 & 1 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & 1 \end{pmatrix}

In index notation, Iii=1I_{ii} = 1 and Iij=0I_{ij} = 0 if iji \neq j.10

Now consider some arbitrary m×nm \times n matrix A\mathbf{A}. Then

(AI)ij=k=1nAikIkj=Aij(AI)_{ij} = \sum_{k=1}^n A_{ik} I_{kj} = A_{ij}

In other words, AI=A\mathbf{AI} = \mathbf{A}. Similaly, IA=A\mathbf{IA} = \mathbf{A}. For this reason, I\mathbf{I} (or In\mathbf{I}_n if we want to be explicit about size) is called the identity matrix.

Note that if A\mathbf{A} is square (m=nm = n), then the left and right identities are the same. Otherwise, the left and right identities necessarily have different dimension (to make the matrix product work out).

Inversion

The inverse of a matrix A\mathbf{A} is denoted A1\mathbf{A}^{-1} and is defined to be the unique matrix such that A1A=I\mathbf{A}^{-1}\mathbf{A} = \mathbf{I}, if such a matrix exists. For such a matrix to exist, A\mathbf{A} must be square. We do not further discuss the existence criteria here.

Suppose that A\mathbf{A} has a left inverse as defined above. Call that B\mathbf{B}. Now suppose that it also has a right inverse C\mathbf{C}. Then we have BA=I\mathbf{BA} = \mathbf{I} and AC=I\mathbf{AC} = \mathbf{I}. Now right-multiply the first equation by C\mathbf{C} to get BAC=IC=C\mathbf{BAC} = \mathbf{IC} = \mathbf{C}. But we can now simplify the left-hand side since AC=I\mathbf{AC} = \mathbf{I} and multiplication is associative: B(AC)=BI=B\mathbf{B(AC)} = \mathbf{BI} = \mathbf{B}. In other words, B=C\mathbf{B} = \mathbf{C}, so if an inverse A1\mathbf{A}^{-1} exists, it is both the right and left inverse. It is also unique because you can recover either side from the other.11

Transposition of a product

We want to find an “expanded” form of (AB)T\left(\mathbf{AB}\right)^{\mathsf{T}}:

((AB)T)ij=(AB)ji=k=1nAjkBki=k=1n(AT)kj(BT)ik=k=1n(BT)ik(AT)kj=(BTAT)ij\begin{aligned} \left((AB)^{\mathsf{T}}\right)_{ij} &= (AB)_{ji} \\ &= \sum_{k=1}^{n} A_{jk} B_{ki} \\ &= \sum_{k=1}^{n} \left(A^{\mathsf{T}}\right)_{kj} \left(B^{\mathsf{T}}\right)_{ik} \\ &= \sum_{k=1}^{n} \left(B^{\mathsf{T}}\right)_{ik} \left(A^{\mathsf{T}}\right)_{kj} \\ &= \left(B^{\mathsf{T}} A^{\mathsf{T}}\right)_{ij} \end{aligned}

So

(AB)T=BTAT\left(\mathbf{AB}\right)^{\mathsf{T}} = \mathbf{B}^{\mathsf{T}} \mathbf{A}^{\mathsf{T}}

You can expand this iteratively to get

(A1A2An)T=AnTAn1TA1T\left(\mathbf{A}_1 \mathbf{A}_2 \cdots \mathbf{A}_n\right)^{\mathsf{T}} = \mathbf{A}_n^{\mathsf{T}} \mathbf{A}_{n-1}^{\mathsf{T}} \cdots \mathbf{A}_{1}^{\mathsf{T}}

Inverse of a product

We can similarly find an expansion for the inverse of a matrix product. In this case, it’s easiest to guess a form similar to the transposition expansion and then verify it.

First, note that the matrix inverse of a product can exist only if its matrix factors are invertible.12 This further implies that both factors must be square (and have the same dimensions).

Let’s guess that (AB)1=B1A1(\mathbf{AB})^{-1} = \mathbf{B}^{-1}\mathbf{A}^{-1}. Since the left and right inverses must be equal, this would imply

(AB)1AB=I(\mathbf{AB})^{-1} \mathbf{AB} = \mathbf{I}

and

AB(AB)1=I.\mathbf{AB} (\mathbf{AB})^{-1} = \mathbf{I}.

Indeed, by expanding and associating as needed we have

B1A1AB=B1(A1A)B=B1IB=B1B=I,\begin{aligned} \mathbf{B}^{-1}\mathbf{A}^{-1}\mathbf{AB} &= \mathbf{B}^{-1}\left(\mathbf{A}^{-1}\mathbf{A}\right)\mathbf{B} \\ &= \mathbf{B}^{-1}\mathbf{I}\mathbf{B} \\ &= \mathbf{B}^{-1}\mathbf{B} \\ &= \mathbf{I}, \end{aligned}

as desired. The right inverse can be confirmed analogously. This also shows that the product inverse necessarily exists if its factors are invertible.

As before, you can iterate this expansion to find the general case:

(A1A2An)1=An1An11A11\left(\mathbf{A}_{1}\mathbf{A}_2\cdots\mathbf{A}_{n}\right)^{-1} = \mathbf{A}_{n}^{-1}\mathbf{A}_{n-1}^{-1}\cdots\mathbf{A}_{1}^{-1}

Footnotes

  1. When whiteboarding, I typically use square brackets because they’re easier to draw.

  2. This shorthand nearly always works because we’re indexing with variables anyway.

  3. This is even less-often called the adjugate operation, but I would advise against this due to ambiguity with the cofactor transposition.

  4. In vector math/physics, the vector product refers to the cross product (or sometimes a generalization of this), whose input is two vectors but whose output is another vector (rather than a scalar).

  5. For anybody who attended the probability session where we first introduced the Cauchy-Schwarz inequality, I originally introduced the dot product through a physical interpretation of “project-then-multiply” and showed that these two definitions are equivalent if you assume an orthonormal basis. I can go over that again for anybody interested.

  6. I haven’t bothered spelling out a matrix-vector product here because, as noted before, we always consider a vector to be a columnar matrix. This is just a special case of the matrix-matrix product where the output is a new column vector.

  7. I recommend always explicitly showing the summation bounds when representing a matrix-matrix product in index notation to clarify the aligned axes. This is increasingly important when more than 2 matrices are involved.

  8. Typically used for operations like efficient masking in deep learning, etc.

  9. That is, they have no inverses, which implies that they are not space-preserving.

  10. Or in the mostly-useless-but-compact Kronecker delta notation: Iij=δijI_{ij} = \delta_{ij}.

  11. If it’s not clear why, start with AB=AC=I\mathbf{AB} = \mathbf{AC} = \mathbf{I} as we established above. Then assume that BC\mathbf{B} \ne \mathbf{C} and see what happens when you cross-multiply.

  12. We will cover this elsewhere, but the gist is that a matrix is invertible if and only if its action “preserves space” (i.e., does not collapse dimensionality of its inputs into a strict subspace). Because matrix-matrix products compose into a new virtual matrix, we know the final composition cannot be space-preserving if the either constituent factor collapses space in the middle.