Posted by Chollet himself: > I don't think people really appreciate how simple A...

energy123 · 2025-01-06T03:23:28 1736133808

> Failure to pass signifies a near-total inability to adapt or problem-solve in unfamiliar situations.

Not necessarily. Get a human to solve ARC-AGI if the problems are shown as a string. They'll perform badly. But that doesn't mean that humans can't reason. It means that human reasoning doesn't have access to the non-reasoning building blocks it needs (things like concepts, words, or in this case: spatially local and useful visual representations).

Humans have good resolution-invariant visual perception. For example, take an ARC-AGI problem, and for each square, duplicate it a few times, increasing its resolution from X*X to 2X*2X. To a human, the problem will be almost exactly equally difficulty. Not for LLMs that have to deal with 4x as much context. Maybe for an LLM if it can somehow reason over the output of a CNN, and if it was trained to do that like how humans are built to do that.

refulgentis · 2025-01-06T05:08:22 1736140102

Excellent point, I'm not sure people are aware, but these are straight-up lifted from standard IQ tests, so they're definitely not all trivially humanly solvable.

I needed an official one for medical reasons a few years back

echelon · 2025-01-06T06:30:02 1736145002

ARC-AGI feels like it would fall to a higher dimensional convolution rather than reasoning.

refulgentis · 2025-01-06T05:07:44 1736140064

Honestly, after that, I'm tuned out completely on him and ARC-AGI. Nice minor sidestory at one point in time.

He's right that this isn't solving all human-intelligence domain level problems.

But the whole stunt, this whole time, was that this was the ARC-AGI benchmark.

The conceit was the fact LLMs couldn't do well on it proved they weren't intelligent. And real researchers would step up to bench well on that, avoiding the ideological tarpit of LLMs, which could never be intelligent.

It's fine to turn around and say "My AGI benchmark says little about intelligence", but, the level of conversation is decidedly more that of punters at the local stables than rigorous analysis.