Lossless Data Compression with Neural Networks by Fabrice Bellard


Unknown among my community, Fabrice Bellard is someone I aspire to be a third as productive as.

Just look at this section:

4 The NC library

In order to ensure that the model is exactly the same in the encoder and in the decoder, we developed a custom C library to implement the various operations needed by the models. It only depends on basic IEEE 754-2008 32 bit floating point operations (fused multiply-add, division, square root) with no reliance on external libraries to ensure we get deterministic results on every CPU and operating system.

He built an entire custom system to evaluate on. In 2019. Rather than go with any of a lot of frameworks available, including at least one in C (Dark).

On x86 CPUs, the AVX2 + FMA instructions are mandatory in order to get correct performance. The basic matrix operations were optimized to be fast with small matrices. It happens in our case because we use small batch sizes. General purpose BLAS routines are much slower in these cases.

He wrote something to beat BLAS.

The library represents the forward evaluation of the model as a bytecode. The bytecode to compute the gradient using backward propagation is automatically computed from this bytecode. Liveness analysis and dead code removal is done to reduce the memory usage and improve cache hits.

Why not add some liveness analysis in my custom system?

Since the decoder needs to evaluate the model incrementally at each timestep (but still does the training on a whole segment as the encoder), the library automatically converts the per-segment forward evaluation bytecode into a per-timestep bytecode. The resulting per-timestep forward evaluation bytecode usually runs slower because smaller matrix multiplications are involved.

And this bit in Section 7:

This experiment would not have been possible without our NC library which ensures deterministic evaluation and training with good performances on standard CPUs.

Wow. That’s dedication to reproducibility, and a lot of ambition.

Related Posts

Compactness of the Classical Groups

Derivative AT a Discontinuity

Just because 2 things are dual, doesn't mean they're just opposites

Boolean Algebra, Arithmetic POV

discontinuous linear functions

Continuous vs Bounded

Minimal Surfaces

November 2, 2023

NTK reparametrization

Kate from Vancouver, please email me