Deep Learning Book Chapter 1 Notes


Started shallow, going deep.

— Matt and Kim

Representations

Deep learning is not about neural nets. Deep learning is about hierarchical representations of concepts. More complicated concepts are defined as the compositions of simpler ones.

Neural nets with lots of hidden layers just happen to be a natural fit for that idea. Since the book focuses on them for the most part, I’ll pretty much just refer to neural nets.

Picking the right representation is hard. There’s a trade-off between simple representations and their generalizability, and more complex ones and their efficiency.

My Thoughts And Speculations

I think a lot of problems can be thought of as looking for a good representation. Maxwell’s equations are a simple and powerful representation of how electricity works. However, finding that representation took a lot of time. Lots of fields, like economics, lack representations of equal power.

Representations and data compression have large parallels under that interpretation.

Rather than finding the representation (a.k.a. feature engineering), we can learn it. An autoencoder is a good example of this. It learns a compression and decompression function, and by combining them learns an approximate representation of the identity function. We can tweak its architecture to have it learn things like sparseness.

Representations can also be thought of as learning a particular form of a function. Formally, a function is defined by its inputs and outputs, not by a formula. But the formula is usually the thing you actually care about. For example, \(sin^2 + cos^2\), \(f(x) = 1\), and \(\sum\limits_{n=1}^{\infty} \frac{1}{2^n}\) all are the same function, but with wildly different representations that come in handy in different situations.

The ability to learn functions makes machine learning (and deep learning in particluar) a potentially useful tool for simulation. I dream that one day, physics simulators won’t need complicated analytical models of dynamics that took years to figure out, but instead learn the dynamics of some system.

Basically, could data plus a machine learning model equal the effort of the best years of a grad student’s life?

This is still a dream. Observing dynamics is hard, and the state space is yuuuuge. Finding representative in that space isn’t easy.

But this is a blog, not a dissertation, and speculation is fun.

At the least, it’d be interesting to use things like system dynamics as units in a computational graph.

That interpretation makes un-blackboxing deep learning especially important because our understanding doesn’t advance very much when we have some opaque oracle function.

Another speculation: could you use known dynamics as a starting point/prior for the new learned ones? Essentially, could you discover quantum electrodynamics from Maxwell’s equations, a neural network, and a lot of data? (This brings up a lot of issues too, but live a little and dream.)

“The past isn’t dead. It’s not even past.”

That quote sums up this section.

Distributed representations

When representing data, modular representations are often better.

The example the book gives is learning color and object identity for \(n\) colors and \(n\) objects. Rather than learning every combination of color and object identity, with is an \(O(n^2)\) task, we can learn each individually and combine them, which is \(O(2n)\). It also generalizes far better.

The lesson here is one that any programmer should be familiar with: keep separate things separate. It’ll save you a lot of pain in the long run.

Related Posts

Just because 2 things are dual, doesn't mean they're just opposites

Boolean Algebra, Arithmetic POV

discontinuous linear functions

Continuous vs Bounded

Minimal Surfaces

November 2, 2023

NTK reparametrization

Kate from Vancouver, please email me

ChatGPT Session: Emotions, Etymology, Hyperfiniteness

Some ChatGPT Sessions