It’s interesting that technology so transformative is only a few hundred lines o...

miki123211 · 2025-01-29T22:02:50 1738188170

Llama2 inference can be implemented in 900-ish lines of dependency-free C89, with no code golfing[1]. More modern architectures (at least the dense, non-MoE models) aren't that much more complicated.

That code is CPU only, uses float32 everywhere and doesn't do any optimizations, so it's not realistically usable for models beyond 100m params, but that's how much it takes to run the core algorithm.

[1] https://github.com/karpathy/llama2.c

hatthew · 2025-01-30T01:47:17 1738201637

A minimal hardcoded definition of the structure: probably a few hundred lines.

The actual definition, including reusable components, optional features, and flexibility for experimentation: probably a few thousand.

The code needed to train the model, including all the data pipelines and management, training framework, optimization tricks, etc.: tens of thousands.

The whole codebase, including experiments, training/inference monitoring, modules that didn't make it into the final architecture, unit tests, and all custom code written to support everything mentioned so far: hundreds of thousands.