This is an excellent tool to realize how an LLM actually works from the ground u...

taliesinb · on Dec 4, 2023

Wow, I love the interactive wizzing around and the animation, very neat! Way more explanations should work like this.

I've recently finished an unorthodox kind of visualization / explanation of transformers. It's sadly not interactive, but it does have some maybe unique strengths.

First, it gives array axis semantic names, represented in the diagrams as colors (which this post also uses). So sequence axis is red, key feature dimension is green, multihead axis is orange, etc. This helps you show quite complicated array circuits and get an immediate feeling for what is going on and how different arrays are being combined with each-other. Here's a pic of the the full multihead self-attention step for example:

https://math.tali.link/raster/052n01bav6yvz_1smxhkus2qrik_07...

It also uses a kind of generalization tensor network diagrammatic notation -- if anyone remembers Penrose's tensor notation, it's like that but enriched with colors and some other ideas. Underneath these diagrams are string diagrams in a particular category, though you don't need to know (nor do I even explain that!).

Here's the main blog post introducing the formalism: https://math.tali.link/rainbow-array-algebra

Here's the section on perceptrons: https://math.tali.link/rainbow-array-algebra/#neural-network...

Here's the section on transformers: https://math.tali.link/rainbow-array-algebra/#transformers

jimmySixDOF · on Dec 4, 2023

You might also like this interactive 3D walk through explainer from PyTorch :

https://pytorch.org/blog/inside-the-matrix/

riemannzeta · on Dec 3, 2023

Are you referring specifically to line 141, which sets the number of embedding elements for gpt-nano to 48? That also seems to correspond to the Channel size C referenced in the explanation text?

https://github.com/karpathy/minGPT/blob/master/mingpt/model....

tomnipotent · on Dec 4, 2023

That matches the name of default model selected in the right pane, "nano-gpt". I missed the "bigger picture" at first before I noticed the other models in the right pane header.

namocat · on Dec 3, 2023

Yes, thank you - It was unexplained, so I got stuck on "Why 48?", thinking I'd missing something right out of the gate.

zombiwoof · on Dec 3, 2023

I was thinking 42 ;-)

jayveeone · on Dec 4, 2023

Yes yes it was the 48 elements thing that got me stuck. Definitely not everything from the second the page loaded.