In theory transformers are Turing-complete and LLMs can do anything computable. The more down-to-earth argument is that transformer LLMs aren't able to correct errors in a systematic way like Lecun is describing: it's task-specific "whack-a-mole," involving either tailored synthetic data or expensive RLHF.
In particular, if you train an LLM to do Task A and Task B with acceptable accuracy, that does not guarantee it can combine the tasks in a common-sense way. "For each step of A, do B on the intermediate results" is a whole new Task C that likely needs to be fine-tuned. (This one actually does have some theoretical evidence coming from computational complexity, and it was the first thing I noticed in 2023 when testing chain-of-thought prompting. It's not that the LLM can't do Task C, it just takes extra training.)
It is not a high standard, I am sure you could train a chimp to pass this test[1]. If you know how to use a standard coffee maker and live in a typical American home, and the test is done in an typical American home with a standard coffee maker, you can definitely pass this test 100% of the time.
I understand that many people don't live in America and don't know how to use a coffee maker. That is 100% irrelevant. There is a frustrating tendency in AI circles to conflate domain knowledge with intelligence, in a way that invariably elevates AI and crushes human intelligence into something tiny.
[1] The hard part would be psychological (e.g. keeping the chimp focused), not cognitive. And of course the chimp would need to bring a chimp-sized ladder... It would be an unlawful experiment, but I suspect if you trained a chimp to use a specific coffee maker in another kitchen, forced the chimp to become addicted to coffee, and then put the animal in a totally different kitchen with a different coffee maker (but similar, i.e. not a French press), it figure would figure out what to do.
"locate the filters, locate the coffee mugs, locate a measuring spoon" in a random house in America is a very high standard. We’ll have to agree to disagree on that. If you teleport me into a random house, I’ll likely spend at least an hour trying and failing at that task, and most of their cabinets and drawers will be open by the end of it.
It also excludes corner cases like "what if they don’t have any filters"? Should the robot go tearing through the house till they find one, or do nothing? But what if there were some in the pantry — does that fail the test? There’s all kinds of implicit assumptions here that make it quite hard.
and what if there's only a Nespresso machine, a Keurig machine, instant, a french press, a moka pot, or a cappuccino machine (we can argue if an americano is actually coffee, but if that's what the house has, and no drip machine + accoutrements, you're not getting anything else)? Human or bot,
that's a lot of possibilities to deal with, but for a bold human unfamiliar with those, they're just a YouTube video away (multiple ones if it's a fancy cappuccino machine). Until AI can learn to make coffee or change an oil filter on a 1997 GMC from watching a YouTube video, it'd be hard to consider it human-grade, even if it has been trained on all of YouTube, which assumedly Google has done. There are certainly things people do on YouTube that I couldn't do after a lot of intense practice, though, so I'm not totally convinced that's the right standard. It doesn't cost millions of hours and dollars of training and fine tuning time for me to, say, be able to tie a bow tie from a YouTube video though, even if it does take me a couple of tries.
It probably shouldn't continue to surprise me how often people's "AI benchmarks" exclude a significant fraction of actual, living, humans from being "human-grade".
You can't honestly claim that it would take you an hour to accomplish such a high probability task - have you never visited the house of a friend or family and had to open a few cabinets to find a water glass or a bowl or a spoon?
As for the point of corner cases being hard - I mean that's the point here, isn't it?
How do you even compare those two things? And how do you separate technology from medicine? This statement seems like pure nonsense, hinging on a slippery definition of "technology" that in some contexts means "consumer gizmos," other contexts means "computers," and in yet other contexts means "Civilization tech tree."
This was very unclear in the article - typically artificial hearts have external pumps and motors, the internal component is basically a fancy valve to manage flow/oxygenation correctly. But this one has an internal pump and motor, like a real heart, but with an external battery pack. (The battery currently only lasts four hours...I am too absent-minded to be trusted with this!)
Long-term planning is necessary if you have biters enabled: typically you need to secure territory/resources and invest in defenses before the resources run low and while the biters are still manageable. Otherwise things can get badly out of hand.
Edit: IMO the biggest difference between Satisfactory and Factorio is that Satisfactory has no crises. If a Satisfactory base shuts down it is annoying, but you can dig another miner / / build another plant / etc, entirely at your leisure. But in Factorio, a shutdown is an emergency with a ticking clock.
We were thinking of creating a minigame resembling a "tower-defense" setting, where waves of bugs get released and the agent needs to create appropriate defenses. It would be interesting to see if agents are capable of defending the base and how much resources would they put towards defenses in a normal game where enemies are enabled
I think the point of “The Main Bus” in guides is that it’s easy for newer players, and thereby takes a lot of complexity off the table when you still haven’t really figured out how petroleum works, or keep falling behind the biters because you underestimated red bullets demand for steel, etc etc etc. Eventually you figure out trains; until then a main bus is an idiot-resistant way to carry resources across the entire base.
Even the example in the post seemed closely related to other advances in consumer-level computing:
I re-created this system using an RPi5 compute module and a $20 camera sensor plugged into it. Within two hours I wrote my first machine learning [application], using the AI to assist me and got the camera on a RPi board to read levels of wine in wine bottles on my test rig. The original project took me six weeks solid!
Undoubtedly this would have taken longer without AI. But I imagine the Raspberry Pi + camera was easier to set up out-of-the-box than whatever they used 14 years ago, and it's definitely easier to set up a paint-by-numbers ML system in Python.
> You can say LLMs are fundamentally dumb because of their inherent linearity. Are they? Isn’t language by itself linear (more precisely, the presentation of it)?
Any linearity (or at least partial ordering) of intelligence comes from time and causality, not language - in fact the linearity of language is a limitation human cognition struggles to fight against.
I think this is where "chimpanzees are intelligent" comes to the rescue - AI has a nasty habit of focusing too much on humans. It is vacuous to think that chimpanzee intelligence can be reduced to a linear sequence of oohs-and-aahs, although I suspect a transformer trained on thousands of hours of chimp vocalizations could keep a real chimp busy for a long time. Ape cognition is much deeper and more mysterious: imperfect "axioms" and "algorithms" about space, time, numbers, object-ness, identifying other intelligences, etc, seem to be somehow built-in, and all apes seem to share deep cognitive tools like self-reflection, estimating the cognitive complexity of a task, robust quantitative reasoning, and so on. Nor does it really make sense to hand-wave about "evolutionary training data" - there are stark micro- and macro-architectural differences between primate brains and squirrel brains. Not to mention that all species have the exact same amount of data - if it was just about millions of years, why are bees and octopi uniquely intelligent among invertebrates? Why aren't there any chimpanzee-level squirrels? Rather than twisting into knots about "high quality evolutionary data," it makes a lot more sense to point towards evolution pressuring the development of different brain architectures with stronger cognitive abilities. (Especially considering how rapidly modern human intelligence seems to evolved - much more easily explained by sudden favorable mutations vs stumbling into an East African data treasure trove.)
Human intelligence uses these "algorithms" + the more modern tool of language to reason about the world. I believe any AI system which starts with language and sensory input[1], then hopes to get causality/etc via Big Data is doomed to failure: it might be an exceptionally useful text generator/processor but there will be infinite families of text-based problems that toddlers can solve but the AI cannot.
[1] I also think sight-without-touch is doomed to failure, especially with video generation, but that's a different discussion. And AIs can somewhat cheat "touch" if they train extensively on a good video game engine (I see RDR2 is used a lot).
Newborns (and certainly toddlers) seem to understand the underlying concepts for these things when it comes to visual/hepatic object identification and "folk physics":
A short list of abilities that cannot be performed by CompressARC includes:
Assigning two colors to each other (see puzzle 0d3d703e)
Repeating an operation in series many times (see puzzle 0a938d79)
Counting/numbers (see puzzle ce9e57f2)
Translation, rotation, reflections, rescaling, image duplication (see puzzles 0e206a2e, 5ad4f10b, and 2bcee788)
Detecting topological properties such as connectivity (see puzzle 7b6016b9)
Note: I am not saying newborns can solve the corresponding ARC problems! The point is there is a lot of evidence that many of the concepts ARC-AGI is (allegedly) measuring are innate in humans, and maybe most animals; e.g. cockroaches can quickly identify connected/disconnected components when it comes to pathfinding. Again, not saying cockroaches can solve ARC :) OTOH even if orcas were smarter than humans they would struggle with ARC - it would be way too baffling and obtuse if your culture doesn't have the concept of written standardized tests. (I was solving state-mandated ARCish problems since elementary school.) This also applies to hunter-gatherers, and note the converse: if you plopped me down among the Khoisan in the Kalahari, they would think I was an ignorant moron. But it makes as much sense scientifically to say "human-level intelligence" entails "human-level hunter-gathering" instead of "human-level IQ problems."
> there is a lot of evidence that many of the concepts ARC-AGI is (allegedly) measuring are innate in humans
I'd argue that "innate" here still includes a brain structure/nervous system that evolved on 3.5 billion years worth of data. Extensive pre-training of one kind or another currently seems the best way to achieve generality.
> Each new training from scratch is a perfect blank slate [...]?
I don't think training runs are done entirely from scratch.
Most training runs in practice will start from some pretrained weights or distill an existing model - taking some model pretrained on ImageNet or Common Crawl and fine-tuning it to a specific task.
But even when the weights are randomly initialized, the hyperparameters and architectural choices (skip connections, attention, ...) will have been copied from previous models/papers by what performed well empirically, sometimes also based on trying to transfer our own intuition (like stacking convolutional layers as a rough approximation of our visual system), and possibly refined/mutated through some grid search/neural architecture search on data.
Sure and LLMs ain’t nothing of this sort. While they’re an incredible feat in technology, they’re just a building block for intelligence, an important building block I’d say.
Newborn brains aren't blank, they are complex beyond our current ability to understand. All mammals are born with a shocking amount of instinctual knowledge built right into their genome.
All organisms are born pre-trained because if you can't hide or survive the moment you're born, you get eaten.
> if you can't hide or survive the moment you're born, you get eaten.
uhhh... no, most newborns can't "hide or survive the moment they're born", no matter the species. I'm sure there are a few examples, but I seriously doubt it's the norm.
Many species survive by reproducing en masse, where it takes many (sometimes thousands of) eaten offspring for one to survive to adulthood.
In humans at least, they lack the motor control to attempt to hide , gather, or hunt obviously. But they do plenty of other stuff instinctively. With my latest, we learned that infants are inherently potty trained (defecation) if you pay attention to the cues…and I was surprised to find that that was true, baby communicates the need to go and knows what’s happening without any training at all. Amazed to have almost zero soiled diapers at one month.
Makes sense though, I’m pretty sure mammals don’t do well with the insects and diseases that come with waste saturated bed.
The point clearly still stands: every species on the planet has a long list of attributes and behaviors directly attributable to evolution and “pretraining.” And many, many more based on education (the things a lioness teaches her cubs.)
I’m not sure we would call anyone intelligent today if they had no education. Intelligence relies on building blocks that are only learned, and the “masters” of certain fields are drawing on decades and decades of learnings about their direct experience.
So our best examples of intelligence include experience, training, knowledge, evolutionary factors, what have you — so we probably need to draw on that to create a general intelligence. How can we expect to have an intelligence in a certain field if it hasn’t spent a lot of time “ruminating on”/experiencing/learning about/practicing/evolving/whatever, on those types of problems?
Please have a baby and report back first-hand observations. Yes, they're far, far more sophisticated than most (all?) humans can comprehend, but they're also completely incapable for multiple months after birth. This isn't unexpected, human babies are born at what would still be mid-late gestation in almost any other mammal.
That quote about how "the only intuitive interface ever devised was the nipple"? Turns out there's still a fair bit of active training required all around to even get that going. There's no such thing as intuitive, only familiar.
> Newborns (and certainly toddlers) seem to understand the underlying concepts for these things when it comes to visual/hepatic object identification and "folk physics"
Yes, they enjoy millions of years of pretraining thanks to evolution, ie. their pretrained base model has some natural propensity for visual, auditory, and tactile sensory modalities, and some natural propensity for spatial and causal reasoning.
It is vacuously true that a Turing machine can simulate a human mind - this is the quantum Church-Turing thesis. Since a Turing machine can solve any arbitrary system of Schrodinger equations, it can solve the system describing every atom in the human body.[1]
The problem is that this might take more energy than the Sun for any physical computer. What is far less obvious is whether there exist any computable higher-order abstractions of the human mind that can be more feasibly implemented. Lots of layers to this - is there an easily computable model of neurons that encapsulates cognition, or do we have to model every protein and mRNA?
It may be analogous to integration: we can numerically integrate almost anything, but most functions are not symbolically integrable and most differential equations lack closed-form solutions. Maybe the only way to model human intelligence is "numerical."
In fact I suspect higher-order cognition is not Turing computable, though obviously I have no way of proving it. My issue is very general: Turing machines are symbolic, and one cannot define what a symbol actually is without using symbols - which means it cannot be defined at all. "Symbol" seems to be a primitive concept in humans, and I don't see how to transfer it to a Turing machine / ChatGPT reliably. Or, as a more minor point, our internal "common sense physics simulator" is qualitatively very powerful despite being quantitatively weak (the exact opposite of Sora/Veo/etc), which again does not seem amenable to a purely symbolic formulation: consider "if you blow the flame lightly it will flicker, if you blow hard it will go out." These symbols communicate the result without any insight into the computation.
[1] This doesn't have anything to do with Penrose's quantum consciousness stuff, it just assumes humans don't have metaphysical souls.
> It is vacuously true that a Turing machine can simulate a human mind - this is the quantum Church-Turing thesis. Since a Turing machine can solve any arbitrary system of Schrodinger equations, it can solve the system describing every atom in the human body. The problem is that this might take more energy than the Sun for any physical computer.
Feynman on "Simulating Physics with Classical Computers" [0] goes beyond that to posit that any classical simulation of quantum-mechanical properties would need exponential space in the number of particles to track the full state space; this very quickly exceeds the entire observable universe when dealing with mere hundreds of particles.
So while yes, the Turing machine model presupposes infinite tape, that is not realizable in practice.
He actually goes further:
Can a quantum system be probabilistically simulated by
a classical (probabilistic, I'd assume) universal computer? In other words, a
computer which will give the same probabilities as the quantum system
does. If you take the computer to be the classical kind I've described so far,
(not the quantum kind described in the last section) and there're no changes
in any laws, and there's no hocus-pocus, the answer is certainly, No! This is
called the hidden-variable problem: it is impossible to represent the results
of quantum mechanics with a classical universal device.
In particular, he takes issue with our ability to classically simulate negative probabilities which give rise to quantum mechanical interference.
I believe Feynman was basically mistaken about the second point, though of course the field was brand new at the time - it is certainly possible to simulate the measurements of quantum mechanics to arbitrarily high accuracy on a classical computer with pseudorandom number generation; if you replace the pseudorandomness with a physical random number generator, then it might even be formally equivalent to a quantum computer (I think that's an open question, haven't tracked the developments in a while).
"Negative probabilities" is not quite right - towards the end of his life Feynman wondered about generalizing probability but that was just about intermediate calculations: he declared physical events cannot have nonnegative probabilities (in the same sense that physically I can't have negative three apples, -3 is a nonphysical abstraction used to simplify accounting). Negative probabilities are not part of modern quantum mechanics, where probabilities are always nonnegative and sum to 1. Quantum states can have negative/complex amplitudes but the probabilities are positive (and classical computers are just as good/bad at complex arithmetic as they are any other).
The "hidden variables" comment makes me think Feynman was actually a bit confused about the philosophy of computation - a classical computer cannot simulate how a quantum particle "truly" evolves over time, but that's also the case for a classical particle! Ultimately it's just a bunch of assembly pushing electrons around, that has nothing to do with a ball rolling down a hill. Computers only have Schrodinger's equation or Newton's laws, which don't care how the motion "truly" works, they just care that the measurement at the end is correct. If a computer gets the correct measurements then by definition we say it simulates the phenomenon.
Edit: clarifying this last point, Newton’s laws do have a known “hidden variables” theory in the sense that we know how an ensemble of high-temperature quantum particles can “average out” into Newton’s laws, there is an electrostatic theory of mechanical contact, etc. This does not (and seemingly cannot) exist for quantum mechanics, but merely having a quantum computer wouldn’t by itself help us figure out what’s going on: the output of a quantum computer are the “visible” variables, aka the observables. The fact that quantum computers are truly using the non-observables, whatever those might be, seemingly cannot be experimentally distinguished from a sufficiently accurate classical computer doing numerical quantum mechanics. If it turns out there is experimentally a serious difference between the results of quantum computers and classical qubit simulators, that would suggest an inadequacy in the foundations of QM.
"Negative probabilities" is being imprecise -- as you allude to, what we really mean are quantum mechanical amplitudes that are out-of-phase relative to each other, such that we get constructive and destructive interference when you convert them into concrete probabilities. (Feynman also acknowledges this lack of precision in terminology, but ultimately this text was not intended to be rigorous scientific proof but rather building intuition for this problem that he was deeply interested in.)
I believe Feynman's discussion of hidden variables is a reference to the EPR paradox (see: Einstein's infamous quote that "God does not play dice") and the various Bell tests (which at this point in time had experimentally demonstrated that hidden-variable theories were inadequate for describing QM). If you continue in the paper, he then goes on to describe one of those experiments involving entangled photons.
In particular, what we definitely can't do is generate random numbers for measurements of individual particles while assuming that they're independent from each other. So now we have to consider the ensemble of particles, and in particular we need to consider the relative phases between each of them. But now we're getting back to the same exponential blowup that caused us to run into problems when we tried to simulate the evolution of the wavefunction from first principles.
In particular, if you train an LLM to do Task A and Task B with acceptable accuracy, that does not guarantee it can combine the tasks in a common-sense way. "For each step of A, do B on the intermediate results" is a whole new Task C that likely needs to be fine-tuned. (This one actually does have some theoretical evidence coming from computational complexity, and it was the first thing I noticed in 2023 when testing chain-of-thought prompting. It's not that the LLM can't do Task C, it just takes extra training.)
reply