What is an autoencoder?

An autoencoder is an approximation of the identity function.

Unlike many approximations, it’s usually meant to be an imperfect one.

An autoencoder $A$ is usually the composition of two parts: an encoder $e$ and a decoder $d$. So $A := d \circ e$. Generally the encoder is surjective and the decoder is injective, but neither are bijective. In English, the encoder maps the original points into a lower-dimensional space, and the decoder maps them back into a higher-dimensional one. This lower-dimensional bottleneck $e(x)$ is where most of the interesting properties of an autoencoder come from.

If you see the standard picture of an autoencoder that makes it look like a tipped-over hourglass, this will make more sense.

Adding a sparsity regularizer such as the $L_1$ norm penalty on the bottleneck layer gives a sparse autoencoder.

Instead of mapping from $x \mapsto x$, we can add some noise to the input and have it try to learn to ignore the noise by giving it the real input as a label.

In math, we use $x + \varepsilon \mapsto x$.

The chapter has loads of other stuff, but it didn’t feel interesting enough to me to write down.