Started shallow, going deep.

— Matt and Kim

## Representations

Deep learning is not about neural nets. Deep learning is about
hierarchical representations of concepts. More complicated concepts are
defined as the *compositions* of simpler ones.

Neural nets with lots of hidden layers just happen to be a natural fit for that idea. Since the book focuses on them for the most part, I’ll pretty much just refer to neural nets.

Picking the right representation is hard. There’s a trade-off between simple representations and their generalizability, and more complex ones and their efficiency.

## My Thoughts And Speculations

I think a lot of problems can be thought of as looking for a good representation. Maxwell’s equations are a simple and powerful representation of how electricity works. However, finding that representation took a lot of time. Lots of fields, like economics, lack representations of equal power.

Representations and data compression have large parallels under that interpretation.

Rather than finding the representation (a.k.a. feature engineering), we
can *learn* it. An autoencoder is a good example of this. It learns a
compression and decompression function, and by combining them learns an
approximate representation of the identity function. We can tweak its
architecture to have it learn things like sparseness.

Representations can also be thought of as learning a particular form of a function. Formally, a function is defined by its inputs and outputs, not by a formula. But the formula is usually the thing you actually care about. For example, \(sin^2 + cos^2\), \(f(x) = 1\), and \(\sum\limits_{n=1}^{\infty} \frac{1}{2^n}\) all are the same function, but with wildly different representations that come in handy in different situations.

The ability to *learn* functions makes machine learning (and deep
learning in particluar) a potentially useful tool for simulation. I
dream that one day, physics simulators won’t need complicated analytical
models of dynamics that took years to figure out, but instead *learn*
the dynamics of some system.

Basically, could data plus a machine learning model equal the effort of the best years of a grad student’s life?

This is still a dream. Observing dynamics is hard, and the state space
is *yuuuuge*. Finding representative in that space isn’t easy.

But this is a blog, not a dissertation, and speculation is fun.

At the least, it’d be interesting to use things like system dynamics as units in a computational graph.

That interpretation makes un-blackboxing deep learning especially important because our understanding doesn’t advance very much when we have some opaque oracle function.

Another speculation: could you use known dynamics as a *starting
point*/prior for the new learned ones? Essentially, could you discover
quantum electrodynamics from Maxwell’s equations, a neural network, and
a *lot* of data? (This brings up a lot of issues too, but live a little
and dream.)

## Historical Trends

“The past isn’t dead. It’s not even past.”

That quote sums up this section.

## Distributed representations

When representing data, modular representations are often better.

The example the book gives is learning color and object identity for \(n\) colors and \(n\) objects. Rather than learning every combination of color and object identity, with is an \(O(n^2)\) task, we can learn each individually and combine them, which is \(O(2n)\). It also generalizes far better.

The lesson here is one that any programmer should be familiar with: keep separate things separate. It’ll save you a lot of pain in the long run.