Mechanistic Interpretability and Lean 4

Proposal: implement https://www.arena.education/ in Lean.

Why?

My intuition is that theorem provers will end up in the final solution to AI safety, if there is one. And if one was devised in “not a theorem prover”, we’d soon want to put it in a theorem prover because the stakes are so high.

For all its flaws, mechinterp is the main way I know of to “debug” neural nets. The low level.

I’ve already argued for theorem proving and Lean 4 specifically as an endpoint of programming languages. The recent work on agents and programmability seems to support that.

I don’t think it’s feasible to fully verify a network (based on Lawrence, Rajashree, and Jason Gross’s work). Even so, it is the time to put deep learning into a theorem prover.

Davidad has his ideas for a big cool world alignment theorem prover that I’ve never seen. And if going with Lean is a mistake, it’s an iceberg I want to break upon. Beats waiting. Hopefully, I’ll use the AI to port Lean to your better thing.

There are serious issues with verifying neural nets, even probabilistically. But whatever CAN be, should be. As in, someone should do it and get some money to do it. Doing that requires putting it in Lean somehow and giving Lean a way to interpret it.

So here’s my proposal for a start. I think the first goal should be implementing the arena.education curriculum in Lean. This will require writing a numpy-like library, meant to be used by people and AI, though realistically more AI.

There are already efforts on floating point in Lean (including my own), and stuff around numerical computing. But programming languages people hate floats and love the discrete. Someone has to push through this ick.

The concrete benefits of this are more speculative. People look funny at me whenever I mention you can even run code in Lean. Wasn’t even obvious to Gwern! But I no longer want to wait. Things are moving so fast. This is a risk, a gap in my view.

What to do

Don’t do it. Well yeah, but this isn’t the post for that. This hasn’t been done at all, and I sure don’t think a bit of money is a total waste.
Do it. Ok, in what?
- Python → I argued that types are good and AI seem to benefit from them too. Also the stakes are so high for AI safety that people want more assurance. If you believe in this stuff, do you really want to risk an errant bug? I have a lot of feelings about this option, but I also think it’s just an intellectual risk.
- Rust: fast but types a bit limited. I think you can get most of that perf from code gen, but it’s a risk. Serious alternative, but my intuition is that it’s wrong.
- Lean: well what do you think I’m gonna put here.
- C++: do you want to do that? All the complexity of types without final assurances, like the Twin Peaks secretary.
- C: maybe, but I doubt it. People have enough problems with it and C++ to want Rust even before AI.

Mechanistic Interpretability and Lean 4

Why?

What to do

John Arbuthnot

zeno timer trick

WHY ARE YOU SO CALM?! My scooter accident

Beyond strength

The amount of honor to go around is very limited

From Priests to Software Engineers

All Their Pretty Persons

the last slop post

why homeless people hide drugs from kids

Are you strong enough to stop?

Mechanistic Interpretability and Lean 4

Why?

What to do

Related Posts

John Arbuthnot

zeno timer trick

WHY ARE YOU SO CALM?! My scooter accident

Beyond strength

The amount of honor to go around is very limited

From Priests to Software Engineers

All Their Pretty Persons

the last slop post

why homeless people hide drugs from kids

Are you strong enough to stop?