Hacker News new | past | comments | ask | show | jobs | submit login

Posted by Chollet himself:

> I don't think people really appreciate how simple ARC-AGI-1 was, and what solving it really means. It was designed as the simplest, most basic assessment of fluid intelligence possible. Failure to pass signifies a near-total inability to adapt or problem-solve in unfamiliar situations.

> Passing it means your system exhibits non-zero fluid intelligence -- you're finally looking at something that isn't pure memorized skill. But it says rather little about how intelligent your system is, or how close to human intelligence it is.

https://bsky.app/profile/fchollet.bsky.social/post/3les3izgd...






> Failure to pass signifies a near-total inability to adapt or problem-solve in unfamiliar situations.

Not necessarily. Get a human to solve ARC-AGI if the problems are shown as a string. They'll perform badly. But that doesn't mean that humans can't reason. It means that human reasoning doesn't have access to the non-reasoning building blocks it needs (things like concepts, words, or in this case: spatially local and useful visual representations).

Humans have good resolution-invariant visual perception. For example, take an ARC-AGI problem, and for each square, duplicate it a few times, increasing its resolution from X*X to 2X*2X. To a human, the problem will be almost exactly equally difficulty. Not for LLMs that have to deal with 4x as much context. Maybe for an LLM if it can somehow reason over the output of a CNN, and if it was trained to do that like how humans are built to do that.


Excellent point, I'm not sure people are aware, but these are straight-up lifted from standard IQ tests, so they're definitely not all trivially humanly solvable.

I needed an official one for medical reasons a few years back


ARC-AGI feels like it would fall to a higher dimensional convolution rather than reasoning.

Honestly, after that, I'm tuned out completely on him and ARC-AGI. Nice minor sidestory at one point in time.

He's right that this isn't solving all human-intelligence domain level problems.

But the whole stunt, this whole time, was that this was the ARC-AGI benchmark.

The conceit was the fact LLMs couldn't do well on it proved they weren't intelligent. And real researchers would step up to bench well on that, avoiding the ideological tarpit of LLMs, which could never be intelligent.

It's fine to turn around and say "My AGI benchmark says little about intelligence", but, the level of conversation is decidedly more that of punters at the local stables than rigorous analysis.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: