## Why The Second Derivative Test Works In One Dimension

Whenever the issue is the

localbehavior of a function near a point, thefirsttechnique you should try is Taylor expansion at that point.

— *The Way of Analysis, Robert Strichartz*

Excellent advice. So we have some point \(x\) where a function \(f\) has
a first derivative of 0 and a *nonzero* second derivative (nonzero so
this test will actually work).

Let’s assume for the sake of this discussion that \(f\) is differentiable 3 times rather than just 2 so I can illustrate something.

For those of you who haven’t taken real analysis, \(\varepsilon\) is a
small number introduced whenever you want to take a limit. It’s usually
strictly positive, but here we’ll let it be any small *nonzero* value.

The Taylor expansion of \(f\) around \(x\) is \(f(x + \varepsilon) \approx f(x) + f'(x) \cdot \varepsilon + \frac{f''(x)}{2!} \cdot \varepsilon^2 + \frac{f'''(x)}{3!} \cdot \varepsilon^3\).

Let’s zoom in and look at values *really* close to \(x\) by making
\(\varepsilon\) really close to 0. Our approximation also becomes exact
as \(\varepsilon \to 0\). Since \(f'(x)\) was assumed to be 0
(remember?), the term \(f'(x) \cdot \varepsilon\) vanishes. As
\(\varepsilon \to 0\), \(\varepsilon^2 >> \varepsilon^3\). So when we’re
close to \(x\), we only have to look at the constant and quadratic
terms, namely \(f(x) + \frac{f''(x)}{2!} \cdot \varepsilon^2\).

Notice that if the sign of the second derivative at \(x\), \(f''(x)\), is positive, adding a small \(\varepsilon\) says that \(f(x + \varepsilon)\) is equal to \(f(x)\) plus some positive term. But then \(f(x + \varepsilon) > f(x)\). So moving a small amount in any in any direction increases the output of the function (since \(\varepsilon\) is squared, and \(\varepsilon^2\) is always positive), meaning we’re at a local minimum. If the sign of the second derivative is negative, then moving a small amount in any direction decreases the output, meaning that we’re at a local maximum.

Notice that this argument works for any even-power terms in the Taylor series, so there’s also a 4th derivative test, a 6th derivative test, and so on. Not that you really use those much.

This is the one-dimensional case. How do we extend it to higher dimensions?

## Eigen Tell You Saw This Coming

Eigenvalues.

Eigenvalues let you *slice* multi-dimensional problems into multiple
one-dimensional problems.

Because of the equality of mixed
partials,
the second derivative (AKA the Hessian) matrix \(H\) is **symmetric**.

Why does that matter?

## The Spectral Theorem

A linear operator has a symmetric matrix if and *only if* it has an
orthogonal eigenbasis with purely **real** eigenvalues.

This is cool. The fact that the eigenvalues are real numbers means that our one-dimensional case perfectly carries over.

This is one of those moments where things in math slot together so perfectly that I can believe that this all meant something in the end.

The eigenvalues each give the scaling along a direction (given by the corresponding eigenvectors). Each one acts the same way as the sign of the second derivative at a point, as described above.

If *all* the eigenvalues are positive, it’s like being at the bottom of
a multidimensional bowl. You’re in a local minimum. Going in any
direction will increase the function output because the eigenvalues
scale everything positively.

If *all* of them are negative, then going in any direction will decrease
your output, so you’re in a local maximum.

If you want a picture, imagine a bowl or something. Or look at the ones here.

If *any* eigenvalue has a sign different from *any* other, then you’re
in a saddle point. You’re
in a maximum if you were constrained to only go along some directions,
and a minimum if constrained to others. But with respect to *all* the
directions, you’re not in a minimum *or* a maximum. To increase or
decrease your output value, you can move along the directions of a
positive or negative eigenvalue.

This all fails if the Hessian has an eigenvalue of zero and is not invertible, but we’re assuming that it’s not. It’s the multidimensional equivalent of assuming the second derivative isn’t 0, because then the test is inconclusive.