General post
Linear algebra basics
Originally posted 2025-07-30
Last updated 2025-07-31
Notation
I’ll try to be consistent with notation on the site. In the context of linear algebra, the following table catpures the intended conventions. Most of these are widely used elsewhere:
Item | Description of notation | Example |
---|---|---|
Scalar variable | lower case, unbolded | |
Vector variable | lower case, bold | |
Matrix variable | upper case, bold | |
Product | implicit multiplication (no dot or “times” symbol) | |
Sum | Appears as normal addition | |
Hadamard product | circle-dot | |
Matrix transposition | superscript T | |
Matrix Hermitian conjugate (adjoint) | superscript dagger | |
Matrix power | superscript power | |
Multiplicative matrix inverse | st power | |
Identity matrix (dimensions are context-dependent) | capital “I” | |
-ary identity matrix | as above, but with subscript | |
Vector norm (magnitude) | absolute value bars | |
Vector norm (magnitude), alternative | same variable name but rendered as a scalar | |
norm | double bars with subscript | |
Frobenius norm | double bars with subscript “F” | |
Complex conjugate | overbar |
Parentheses are avoided where possible because multiplication is associative.
Finally, “index notation” makes many calculations clearer and is very compact. For a vector , refers to the th atomic entry in the list (essentially always 1-based). For a matrix , refers to the atomic entry in the th row and th column. This is very powerful because it can be extended to more complicated matrix definitions, for example refers to the atom in the th row and th column in the product (output) matrix .
Note that index notation uses a non-boldfaced font since it describes the value of an atomic value rather than a matrix (or vector) per se.
Definitions
- A scalar is any single atomic value in your algebraic field. By default, you should assume this is a complex value unless we are talking about probabilities or random variables.
- A vector is an ordered list of atomic values (scalars). It has a single dimensional index, i.e., it is a rank-1 tensor.
- A matrix is a two-dimensional structured rectangular container of atomic values (scalars), i.e., a rank-2 tensor. As with vectors, the order (or more accurately, absolute position) of each cell matters. We say that a matrix is (or has dimensions) if it has rows and columns. Just as in index notation, the row comes first.
When rendering a matrix, we use parenthetical brackets consistently.1
For example, here is a matrix rendered with its cells explicitly labeled in index notation:
Note that we don’t use a comma to separte the indices when it’s unambiguous.2
Finally, we always assume that any vector is a column vector unless otherwise stated. That is, if is an -dimensional vector, then when represented as a matrix (such as for multiplication), it has dimension . The corresponding row vector is , which has dimension .
Basic operations
Transposition
To transpose a matrix, you just swap its elements index-wise. This turns an matrix into an matrix.
In index notation:
And visually with an example:
Note that by definition, this preserves the values along the diagonal of the matrix (all cells which can be indexed as ).
Hermitian transposition
The Hermitian transpose—sometimes called the conjugate transpose or adjoint operation3—is the same as transposition, but you also take the complex conjugate of each cell.
In index notation:
Addition (and subtraction)
For our purposes, you can only add a matrix (or vector) to anther matrix (or vector) with the same dimensions. It operates element-wise:
Matrix addition inherits commutativity from this definition.
Subtraction works analogously, but you take the additive inverse of each entry of the right-hand side:
Scalar-matrix product
The scalar-matrix product is the result of multiplying a scalar with a matrix. In index notation, it is defined as . In some sense, the scalar () “distributes” over the entire matrix (you mulitply element-wise). Since atomic products commute, so do scalar-matrix products. Conventionally, however, you show the scalar on the left-hand side of a product.
Dot product
The dot product, sometimes (confusingly, in this context4) referred to as the scalar product or vector-vector product, “multiplies” two vectors together and produces a scalar output.
In coordinate-free vector math, this is typically denoted by .
However, since we always consider vectors to be columnar matrices and use implicit multiplication, we instead use to show a matrix-matrix (dot) product.
This operation is defined as5 the sum of matching coordinate-wise products as shown below. Note that both and must have the same dimension :
Matrix-matrix product
The matrix-matrix product6 is defined to be the composite matrix you get by taking the dot product of each row on the left-hand matrix with each column on the right-hand matrix. Each such row-column pair determines a unique cell in the output. This definition implies that you can only take the product if is and is , since the row vectors in must align with the column vectors in . The corresponding output is .
Finally, in index notation:7
Here’s a visual example. Define () and () as
This gives
where the final output is .
Hadamard product
The last (and least-used8) product type is the Hadamard product, which multiplies corresponding cells element-wise. Like addition, this requires both input matrices to have the same dimensions:
Core matrix-matrix product properties
Not commutative
The matrix-matrix product in general is not commutative. Here’s a counter-example:
But
You can immediately tell that the results are not even simple transpositions or rotations of each other because they have entirely different values.
This is a somewhat pathological example because both matrices are singular.9 However, the operation still does not in general commute even among non-singular matrices.
Associative
Let be , be , and be .
Consider the product :
Now consider :
So . We almost always drop the parentheses.
Distributive
Consider . Suppose that is and and are (recall that addition requires dimensions to match). In index notation, we have
In other words, . Left-multiplication distributes over matrix addition. You can similarly show that right-multiplication distributes over addition.
Be careful when applying this! Multiplication still does not commute, so if you want to factor an expression such as , you must write it as rather than the unrelated . For similar reasons, you cannot factor , since the common factor is a left-multiplicand in one term and a right-multiplicand in the other.
The identity matrix and inversion
Identity matrix
Consider the (square) diagonal matrix
In index notation, and if .10
Now consider some arbitrary matrix . Then
In other words, . Similaly, . For this reason, (or if we want to be explicit about size) is called the identity matrix.
Note that if is square (), then the left and right identities are the same. Otherwise, the left and right identities necessarily have different dimension (to make the matrix product work out).
Inversion
The inverse of a matrix is denoted and is defined to be the unique matrix such that , if such a matrix exists. For such a matrix to exist, must be square. We do not further discuss the existence criteria here.
Suppose that has a left inverse as defined above. Call that . Now suppose that it also has a right inverse . Then we have and . Now right-multiply the first equation by to get . But we can now simplify the left-hand side since and multiplication is associative: . In other words, , so if an inverse exists, it is both the right and left inverse. It is also unique because you can recover either side from the other.11
Transposition of a product
We want to find an “expanded” form of :
So
You can expand this iteratively to get
Inverse of a product
We can similarly find an expansion for the inverse of a matrix product. In this case, it’s easiest to guess a form similar to the transposition expansion and then verify it.
First, note that the matrix inverse of a product can exist only if its matrix factors are invertible.12 This further implies that both factors must be square (and have the same dimensions).
Let’s guess that . Since the left and right inverses must be equal, this would imply
and
Indeed, by expanding and associating as needed we have
as desired. The right inverse can be confirmed analogously. This also shows that the product inverse necessarily exists if its factors are invertible.
As before, you can iterate this expansion to find the general case:
Footnotes
-
When whiteboarding, I typically use square brackets because they’re easier to draw. ↩
-
This shorthand nearly always works because we’re indexing with variables anyway. ↩
-
This is even less-often called the adjugate operation, but I would advise against this due to ambiguity with the cofactor transposition. ↩
-
In vector math/physics, the vector product refers to the cross product (or sometimes a generalization of this), whose input is two vectors but whose output is another vector (rather than a scalar). ↩
-
For anybody who attended the probability session where we first introduced the Cauchy-Schwarz inequality, I originally introduced the dot product through a physical interpretation of “project-then-multiply” and showed that these two definitions are equivalent if you assume an orthonormal basis. I can go over that again for anybody interested. ↩
-
I haven’t bothered spelling out a matrix-vector product here because, as noted before, we always consider a vector to be a columnar matrix. This is just a special case of the matrix-matrix product where the output is a new column vector. ↩
-
I recommend always explicitly showing the summation bounds when representing a matrix-matrix product in index notation to clarify the aligned axes. This is increasingly important when more than 2 matrices are involved. ↩
-
Typically used for operations like efficient masking in deep learning, etc. ↩
-
That is, they have no inverses, which implies that they are not space-preserving. ↩
-
Or in the mostly-useless-but-compact Kronecker delta notation: . ↩
-
If it’s not clear why, start with as we established above. Then assume that and see what happens when you cross-multiply. ↩
-
We will cover this elsewhere, but the gist is that a matrix is invertible if and only if its action “preserves space” (i.e., does not collapse dimensionality of its inputs into a strict subspace). Because matrix-matrix products compose into a new virtual matrix, we know the final composition cannot be space-preserving if the either constituent factor collapses space in the middle. ↩
tags: