## What is an autoencoder?

An autoencoder is an approximation of the identity function.

Unlike many approximations, it’s usually meant to be an imperfect one.

An autoencoder \(A\) is usually the composition of two parts: an encoder
\(e\) and a decoder \(d\). So \(A := d \circ e\). Generally the encoder
is surjective and the decoder is injective, but neither are bijective.
In English, the encoder maps the original points into a
lower-dimensional space, and the decoder maps them back into a
higher-dimensional one. This lower-dimensional *bottleneck* \(e(x)\) is
where most of the interesting properties of an autoencoder come from.

If you see the standard picture of an autoencoder that makes it look like a tipped-over hourglass, this will make more sense.

Adding a sparsity regularizer such as the \(L_1\) norm penalty on the bottleneck layer gives a sparse autoencoder.

Instead of mapping from \(x \mapsto x\), we can add some noise to the input and have it try to learn to ignore the noise by giving it the real input as a label.

In math, we use \(x + \varepsilon \mapsto x\).

The chapter has loads of other stuff, but it didn’t feel interesting enough to me to write down.