Eigen Apply Myself When I Want To: Finding Local Minima


Why The Second Derivative Test Works In One Dimension

Whenever the issue is the local behavior of a function near a point, the first technique you should try is Taylor expansion at that point.

The Way of Analysis, Robert Strichartz

Excellent advice. So we have some point where a function has a first derivative of 0 and a nonzero second derivative (nonzero so this test will actually work).

Let’s assume for the sake of this discussion that is differentiable 3 times rather than just 2 so I can illustrate something.

For those of you who haven’t taken real analysis, is a small number introduced whenever you want to take a limit. It’s usually strictly positive, but here we’ll let it be any small nonzero value.

The Taylor expansion of around is .

Let’s zoom in and look at values really close to by making really close to 0. Our approximation also becomes exact as . Since was assumed to be 0 (remember?), the term vanishes. As , . So when we’re close to , we only have to look at the constant and quadratic terms, namely .

Notice that if the sign of the second derivative at , , is positive, adding a small says that is equal to plus some positive term. But then . So moving a small amount in any in any direction increases the output of the function (since is squared, and is always positive), meaning we’re at a local minimum. If the sign of the second derivative is negative, then moving a small amount in any direction decreases the output, meaning that we’re at a local maximum.

Notice that this argument works for any even-power terms in the Taylor series, so there’s also a 4th derivative test, a 6th derivative test, and so on. Not that you really use those much.

This is the one-dimensional case. How do we extend it to higher dimensions?

Eigen Tell You Saw This Coming

Eigenvalues.

Eigenvalues let you slice multi-dimensional problems into multiple one-dimensional problems.

Because of the equality of mixed partials, the second derivative (AKA the Hessian) matrix is symmetric.

Why does that matter?

The Spectral Theorem

A linear operator has a symmetric matrix if and only if it has an orthogonal eigenbasis with purely real eigenvalues.

This is cool. The fact that the eigenvalues are real numbers means that our one-dimensional case perfectly carries over.

This is one of those moments where things in math slot together so perfectly that I can believe that this all meant something in the end.

The eigenvalues each give the scaling along a direction (given by the corresponding eigenvectors). Each one acts the same way as the sign of the second derivative at a point, as described above.

If all the eigenvalues are positive, it’s like being at the bottom of a multidimensional bowl. You’re in a local minimum. Going in any direction will increase the function output because the eigenvalues scale everything positively.

If all of them are negative, then going in any direction will decrease your output, so you’re in a local maximum.

If you want a picture, imagine a bowl or something. Or look at the ones here.

If any eigenvalue has a sign different from any other, then you’re in a saddle point. You’re in a maximum if you were constrained to only go along some directions, and a minimum if constrained to others. But with respect to all the directions, you’re not in a minimum or a maximum. To increase or decrease your output value, you can move along the directions of a positive or negative eigenvalue.

This all fails if the Hessian has an eigenvalue of zero and is not invertible, but we’re assuming that it’s not. It’s the multidimensional equivalent of assuming the second derivative isn’t 0, because then the test is inconclusive.

Related Posts

How to change tabs to spaces

Another Way to Write Conditional Probability

Notes for May 9, 2019

A tool to look at random images

Safeway

e-ink reminisces

Edwin Edwards

Middle School by Bo Burnham

How to Disable Disqus Ads on your Blog

Derivation of Reservoir Sampling