Eigen Apply Myself When I Want To: Finding Local Minima


Why The Second Derivative Test Works In One Dimension

Whenever the issue is the local behavior of a function near a point, the first technique you should try is Taylor expansion at that point.

The Way of Analysis, Robert Strichartz

Excellent advice. So we have some point \(x\) where a function \(f\) has a first derivative of 0 and a nonzero second derivative (nonzero so this test will actually work).

Let’s assume for the sake of this discussion that \(f\) is differentiable 3 times rather than just 2 so I can illustrate something.

For those of you who haven’t taken real analysis, \(\varepsilon\) is a small number introduced whenever you want to take a limit. It’s usually strictly positive, but here we’ll let it be any small nonzero value.

The Taylor expansion of \(f\) around \(x\) is \(f(x + \varepsilon) \approx f(x) + f'(x) \cdot \varepsilon + \frac{f''(x)}{2!} \cdot \varepsilon^2 + \frac{f'''(x)}{3!} \cdot \varepsilon^3\).

Let’s zoom in and look at values really close to \(x\) by making \(\varepsilon\) really close to 0. Our approximation also becomes exact as \(\varepsilon \to 0\). Since \(f'(x)\) was assumed to be 0 (remember?), the term \(f'(x) \cdot \varepsilon\) vanishes. As \(\varepsilon \to 0\), \(\varepsilon^2 >> \varepsilon^3\). So when we’re close to \(x\), we only have to look at the constant and quadratic terms, namely \(f(x) + \frac{f''(x)}{2!} \cdot \varepsilon^2\).

Notice that if the sign of the second derivative at \(x\), \(f''(x)\), is positive, adding a small \(\varepsilon\) says that \(f(x + \varepsilon)\) is equal to \(f(x)\) plus some positive term. But then \(f(x + \varepsilon) > f(x)\). So moving a small amount in any in any direction increases the output of the function (since \(\varepsilon\) is squared, and \(\varepsilon^2\) is always positive), meaning we’re at a local minimum. If the sign of the second derivative is negative, then moving a small amount in any direction decreases the output, meaning that we’re at a local maximum.

Notice that this argument works for any even-power terms in the Taylor series, so there’s also a 4th derivative test, a 6th derivative test, and so on. Not that you really use those much.

This is the one-dimensional case. How do we extend it to higher dimensions?

Eigen Tell You Saw This Coming

Eigenvalues.

Eigenvalues let you slice multi-dimensional problems into multiple one-dimensional problems.

Because of the equality of mixed partials, the second derivative (AKA the Hessian) matrix \(H\) is symmetric.

Why does that matter?

The Spectral Theorem

A linear operator has a symmetric matrix if and only if it has an orthogonal eigenbasis with purely real eigenvalues.

This is cool. The fact that the eigenvalues are real numbers means that our one-dimensional case perfectly carries over.

This is one of those moments where things in math slot together so perfectly that I can believe that this all meant something in the end.

The eigenvalues each give the scaling along a direction (given by the corresponding eigenvectors). Each one acts the same way as the sign of the second derivative at a point, as described above.

If all the eigenvalues are positive, it’s like being at the bottom of a multidimensional bowl. You’re in a local minimum. Going in any direction will increase the function output because the eigenvalues scale everything positively.

If all of them are negative, then going in any direction will decrease your output, so you’re in a local maximum.

If you want a picture, imagine a bowl or something. Or look at the ones here.

If any eigenvalue has a sign different from any other, then you’re in a saddle point. You’re in a maximum if you were constrained to only go along some directions, and a minimum if constrained to others. But with respect to all the directions, you’re not in a minimum or a maximum. To increase or decrease your output value, you can move along the directions of a positive or negative eigenvalue.

This all fails if the Hessian has an eigenvalue of zero and is not invertible, but we’re assuming that it’s not. It’s the multidimensional equivalent of assuming the second derivative isn’t 0, because then the test is inconclusive.

Related Posts

Just because 2 things are dual, doesn't mean they're just opposites

Boolean Algebra, Arithmetic POV

discontinuous linear functions

Continuous vs Bounded

Minimal Surfaces

November 2, 2023

NTK reparametrization

Kate from Vancouver, please email me

ChatGPT Session: Emotions, Etymology, Hyperfiniteness

Some ChatGPT Sessions