Prerequisites (In Order Of Importance)
 Something to write on
 Knowing what linear functions are, for the most part.
 Vaguely remembering that matrices are representations of a linear function in terms of some basis.
 You’ve heard the word eigenvalue and you sort of fuzzily understand linear algebra. You wonder why these are important.
Thoughts On Explaining All This Stuff
Normally I like to write explanatory blog posts in this order:
 What is it?
 Why does it matter?
 How does it work?
 Why does it work?
But I don’t think that format is useful here. So we’re going to do something different.
We’re going to take apart one sentence.
Almost all linear operators are diagonal operators in some basis.
For that to make sense, we have to define:
 “almost all”
 “linear operators”
 “diagonal operators”
 “in some basis”
We’re going to do that out of order too.
Linear Operators
A linear operator is a linear function \(f\) that maps a vector space \(V\) into itself. They are represented by square matrices, for reasons that will hopefully become obvious soon.
We’re only working with finite dimensional spaces here.
Fundamental Decomposition Of Linear Algebra
I just made up that term, but I think it deserves the title because it reveals why bases are so important and why linear functions are so simple compared to other functions.
By the definition of a basis (which you should google if you don’t remember, right now), any vector can be written as a linear combination of basis elements.
Specifically, for any vector \(v \in V\), \(v = \sum_i a_ie_i\), where the \(e_i\) are a shorthand for the basis elements \(e_1 \dots e_n\), and the \(a_i\) are the necessary coefficients. I’m just going to refer to the index \(i\) as a shorthand for all the elements considered independently since we only ever look at them one at a time.
Anyway, let’s give an example of this decomposition. In terms of the basis \(e_1 = \begin{bmatrix} 1\\0 \end{bmatrix}, e_2 = \begin{bmatrix} 0 \\ 1 \end{bmatrix}\) (Also known as the standard basis.), \(v =\begin{bmatrix} 4 \\ 3 \end{bmatrix} = 4 \begin{bmatrix}1 \\ 0 \end{bmatrix} + 3 \begin{bmatrix}0 \\ 1 \end{bmatrix} = 4e_1 + 3e_2\).
That’s why you can write \(v\) as \(\begin{bmatrix} 4 \\ 3 \end{bmatrix}\). If our basis elements had their signs flipped, \(v\) would be \(\begin{bmatrix} 4 \\ 3 \end{bmatrix}\) in terms of the (flipped) basis elements.
Linear functions play nicely with this decomposition. Since you can pull out constants and split over addition, you can figure out how a linear function acts on any vector by just knowing how it acts on any basis.
For any linear function \(f\) and any vector \(v\) and any basis \(e_1 \dots e_n\), \(f(v) = f(\sum_i a_ie_i) = \sum_i f(a_ie_i) = \sum_i a_i f(e_i)\).
We already know all the \(a_i\), so we just need to figure out \(f(e_i)\), or in other words, all the hard work is in figuring out what the function does to each basis element.
I think this mathematically trivial point is hugely underrated because it underlies the entire field of computational linear algebra.
This is why matrices describe linear functions in the first place. Each column of the matrix of a linear function is the image of a basis vector in the domain in terms of a basis in the range. We pick 2 bases, one in the domain and one in the range.
This is where matrices implicitly use bases. Matrix multiplication is defined the weird way it is to represent function composition. The columns of matrices just carry out the information of what happens to each basis element under the linear function it’s representing.
But today we’re only concerned about functions that map a space into itself, so we’re going to use the same basis for the domain and range^{1}, because the domain and the range are equal for our purposes.
How Linear Functions Can Act On Vectors
Let’s narrow our focus onto \(f(e_i)\) for some fixed \(i\). Since we just sum over \(i\), we’re not missing the big picture, which is such a useful property that this is the 3rd time I’ve repeated it.
\(e_i\) is just a vector. What’s the simplest way \(f\) can act on it?
Scalar multiplication. Specifically, for some constant \(c\), \(f(e_i) = c \cdot e_i\). Nothing is easier than scaling.
What if \(f\) acts like scalar multiplication on all the \(e_i\), each with some scale factor \(c_i\)? Then \(f\) is really just a glorified sort of multiplication on \(n\) vectors.
Since linearity lets us split any function acting on a vector into the function acting on basis vectors independently, we’ve reduced one hard problem into \(n\)^{2} easy ones.
If \(f\) acts like this on the basis you’ve chosen, it’s called a diagonal operator.
The basis elements are then named eigenvectors and the scale factors \(c_i\) are called eigenvalues. I’ve deliberately avoided using the standard symbol \(\lambda\) for eigenvalues to not ruin the surprise.
Since \(f(e_i) = c_i \cdot e_i\), we can plug it into the decomposition above to get \(f(v) = \sum_i a_ic_i \cdot e_i\), which is just a sum of scalar multiplications against known vectors. That’s easy to compute, something I’m really trying to push here if you haven’t noticed.
So any diagonal operator is just a bunch of scalings in \(n\) linearly independent directions. The eigenvectors tell you where to scale, and the eigenvalues tell you how much to scale.
The prefix eigen is German for characteristic, which I think is a good way of intuitively summing up how these vectors and constants are characteristic of an operator.
The whole basis gets the name eigenbasis, which I think is a good name in light of the above. You can think of the eigenbasis as the best basis for an operator. There are infinitely many bases to pick from, but this is the most natural one.
Matrix of a Diagonal Operator
Each basis element is just scaled, so it’s a diagonal matrix in terms of that basis, which is where the “diagonal” in the name comes from.
But What About NonDiagonal Operators?
Even though there are infinitely many bases for a vector space, it’s not clear that any of them is an eigenbasis for \(f\). In fact, \(f\) sometimes does not have an eigenbasis.
But here’s the amazing thing^{3}. It turns out that almost all operators are diagonalizable, which means that there exists a basis in which they are diagonal. So by changing bases, we get all the goodies discussed above.
That’s my main sentence (almost) fully explained.
Since diagonal operators act like scaling on each of those special basis elements, you can use that to find the special basis in question. That’s where the eigenvalue equation, \(Av = \lambda v\) comes from.^{4}
The Motion of the Spheres
To illustrate what I’m saying with a (cool) example, we’re going to see how points on 2D and 3D spheres are affected by random linear operators. These operators will almost always be diagonalizable.
We’re going to look at spheres because they illustrate how every point is a sum of scalings along the directions given by the eigenvectors. Plus they’re pretty.
Almost all linear operators will transform a sphere into a nondegenerate ellipse^{5}. Nondegenerate just means that it looks like an ellipse. A line is technically an ellipse where one axis has length 0, but that’s cheating.
If the matrix is symmetric, the eigenvectors are orthogonal and are also the axes of the ellipse, and the stretch along the axes corresponds to the eigenvalues.
I used a Python script to generate these, which can be viewed here. It’s not documented at all, so have fun with that.
Pretty Pictures
“Almost All”
I keep saying this. In short, what it means is that if you randomly picked a linear operator out of the space of all linear operators, it would be diagonalizable in the same way that you’re pretty much guaranteed to get tails at least once if you keep flipping a coin forever. Those who want a technical definition can go on Wikipedia.
Credit Where It’s Due (Besides To Me)
I owe my friend Jonathan Shewchuk the title of this post from his paper An Introduction to the Conjugate Gradient Method Without the Agonizing Pain

Pedantic readers that feel the urge to point out that range and codomain aren’t always the same can go away. ↩

\(n\) is the dimension of the space. ↩

Which I’m not going to prove, since that’s not my goal. Just know that almost all operators are similar to a diagonal operator. ↩

In more technical terms, diagonalizable operators reduce to the direct sum of 1dimensional invariant subspaces, and the eigenvalue equation is just the definition of a 1dimensional invariant subspace. I could have explained this post in terms of invariant subspaces, but then all the pretty pictures would be a bit out of left field. ↩

The nondegenerate part is because almost all linear operators are invertible, an equally amazing fact. Linear operators are nice. ↩