> there is a lot of evidence that many of the concepts ARC-AGI is (allegedly) me...

SiempreViernes · 2025-03-05T02:24:40 1741141480

So all billons spent finding tricks and architectures that perfom well haven't resulted in any durable structures in contemporary LLMs?

Each new training from scratch is a perfect blank slate and the only thing ensuring words come out is the size of the corpus?

Ukv · 2025-03-05T10:22:36 1741170156

> Each new training from scratch is a perfect blank slate [...]?

I don't think training runs are done entirely from scratch.

Most training runs in practice will start from some pretrained weights or distill an existing model - taking some model pretrained on ImageNet or Common Crawl and fine-tuning it to a specific task.

But even when the weights are randomly initialized, the hyperparameters and architectural choices (skip connections, attention, ...) will have been copied from previous models/papers by what performed well empirically, sometimes also based on trying to transfer our own intuition (like stacking convolutional layers as a rough approximation of our visual system), and possibly refined/mutated through some grid search/neural architecture search on data.

onemoresoop · 2025-03-05T02:13:00 1741140780

Sure and LLMs ain’t nothing of this sort. While they’re an incredible feat in technology, they’re just a building block for intelligence, an important building block I’d say.