Random stuff

I just made the slug for these URLs random since I’m sick of caring about them. Search is good enough to not have to care.

OpenAI paper on compute from this month (which I failed to find the link for easily so I gave up).

The ‘info extractor’ metaphor for large nets nicely explains why lower level data can be more useful than higher level stuff: it subsumes it and contains at least as much info as it.

I wonder how starting with higher level info and annealing towards lower level representations of the same data over training would work. the data can be consistently formatted by encoding the high level in the low (like compiling to machine code).

The compute budget stuff is cool too but I’m not going over that here.

It does seem to imply that the transformer/attention is a better primitive than conv because of what looks like better scaling behavior. The image is worth 16x16 words seems to lend evidence to that. Worse pref (compared to conv) initially but higher threshold if you have the compute for it. Dunno if it’s good enough for bigger stuff.

This is really impressing the trade-off between specialization/faster training/more cleverness required/less compute vs generality/higher performance/more compute. Yeah that’s ugly to parse.

Panjabi MC still satisfies. Ashok Gill’s voice — goddamn.

Related Posts

Use of emphasis in speech

Generating a lot of language data with a theorem prover

"Litany Against Fear" in Present Tense

When it's time to party we will party hard

these are people who died

divine carrot

the frog

what it’s like to get nail phenolization

Why 0 to the power of 0 is 1

Lines and Points are Circles