(I sat on this post for a while, but rather than let it rot in my drafts, I decided to publish it even though it’s lacking something.)

For me, Anki broke the myth that memory was something you either had or you didn’t.

Plenty of posts exist about how Anki helps you remember things. I’ll just say that if you put something into Anki, you will remember it. Memory becomes a choice.

That’s not what I want to talk about. Anki also makes you smarter. In the vein of my last post, mastery is largely compressing your knowledge about a topic.

Anki helps you compress and internalize your knowledge.


Definitions are to math as learning notes is to music. They’re part of the low-level you need before you can go on to higher things.

Memory is caching, and caching spares you the effort of dredging something out of memory or going on Wikipedia every time. For some things you don’t need a lot like the exact technical conditions of the Stone-Weierstrass theorem (subalgebra vanishes nowhere and separates points), Wikipedia is fine. On the other hand, doing probability without remembering that expectation is linear is just being masochistic.

As you learn more, you start to chunk definitions. For example, here’s my card for the definition of a vector space:

An abelian group acted on by the multiplication of a field.

This relies on at least 4 other pieces of knowledge to understand. Just reading off the card, you need to know:

  • abelian
  • group
  • group action
  • field

I have cards for all these, and the basic pieces of knowledge they rely on, going all the way down to semigroups.

By breaking a definition into its simplest components, you can see what different concepts have in common, and use that to find new connections.

Canonical Examples

example of a group
set of all isomorphisms

Groups are meant to capture intuitions about symmetry. So the canonical example is the group of all symmetries. This card helps me internalize that lesson.

Repeat in several ways

I have 3 cards for the definition of variance:

  1. Directly in terms of expected squared distance from the expected value
  2. The reduced form of the above where you cancel the cross term
  3. In terms of the covariance.

Each definition is useful in different cases, and knowing all of them makes the algebra I do every day much easier. It also makes me prefer covariance for its superior algebraic properties.

Find Connections

I noticed that covariance had a lot of nice properties after a few Anki reviews. Those same reviews covered inner products. I noticed a bunch of commonalities.

Turns out covariance is almost an inner product (it’s not definite). but if you take a quotient, it can be made into one.

That connection caused me to finally learn how quotients worked, which led to me understanding gluing in topology and tensor products recently.

For that reason, I recommend keeping all your cards in one big deck.


A few years ago, whenever I felt a strong emotion, I got in the habit of describing it in a particular way. “I am afraid” became “I feel fear”. I realized that the word fear flashed through my head.

So a few months ago, I decided to see if I could trigger the thought after that.

[this video, embedded](

Now whenever I feel fear, that clip goes through my head.

Lack of understanding

When I see some paper and go “I could never write that”:

There is no magic. Just knowledge, more or less hidden.

Then I feel the urge to rise to the challenge rather than run from it.

I internalize what I want to be rather than what I am now.


One thing I put into Anki is quotes, like this one by Richard Bellman:

"Approximations may be made in two places - in the model and in its solution. It is not at all clear which is better."

In reality, approximations are made in both places. But one is often emphasized.

Because deep learning is in the pocket of Big Tensor, approximations are more common in the solution, with really complex models.

In classic robotics, the model is simple and sometimes analytically solvable. LQR assumes linear dynamics and quadratic cost, hence the acronym.

It's more important to do the right things than to do things right.

I remember that one whenever I get caught up in some micro-optimization these days.

Related Posts

Just because 2 things are dual, doesn't mean they're just opposites

Boolean Algebra, Arithmetic POV

discontinuous linear functions

Continuous vs Bounded

Minimal Surfaces

November 2, 2023

NTK reparametrization

Kate from Vancouver, please email me

ChatGPT Session: Emotions, Etymology, Hyperfiniteness

Some ChatGPT Sessions