Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I am baffled that people have to continue making this argument over and over and over. Your rationale makes total sense to me, but the debate rages on whether or not LLMs are more than just words.

Articles like this only seem to confirm that any reasoning is an illusion based on probabilistic text generation. Humans are not carefully writing out all the words of this implicit reasoning, so the machine cant appear to mimic them.

What am I missing that makes this debatable at all?




I don’t think there are any reasonable arguments against that point, but “LLMs are more than just words” is sort of unfalsifiable, so you can never convince someone otherwise if they’re really into that idea.

From a product point of view, sometimes all you need is Plato’s cave (to steal that from the OC) to make a sale, so no company has incentive to go against the most hype line of thought either.


We already know LLMs are more than just words, there are literally papers demonstrating the world models they build. One of the problems is that LLMs build those world models from impoverished sensory apparatus (the digital word token), so the relations they build between the concepts behind words are weaker than humans who build deeper multimodal relations over a lifetime. Multimodal LLMs have been shown to significantly outperform classic LLMs of comparable size, and that's still a weak dataset compared to human training.


> We already know LLMs are more than just words,

Just because you say something doesn’t mean it’s true.

They are literally next token prediction machines normally trained on just text tokens.

All they know is words. It happens that we humans encode and assign a lot of meaning in words and their semantics. LLMs can replicate combinations of words that appear to have this intent and understanding, even though they literally can’t, as they were just statistically likely next tokens. (Not that knowing likely next tokens isn’t useful, but it’s far from understanding)

Any assignment of meaning, reasoning, or whatever that we humans assign is personification bias.

Machines designed to spit out convincing text successfully spits out convincing text and now swaths of people think that more is going on.

I’m not as well versed on multimodal models, but the ideas should be consistent. They are guessing statistically likely next tokens, regardless of if those tokens represent text or audio or images or whatever. Not useless at all, but not this big existential advancement some people seem to think it is.

The whole AGI hype is very similar to “theory of everything” hype that comes and goes now and again.


> They are literally next token prediction machines normally trained on just text tokens.

And in order to predict the next token well they have to build world models, otherwise they would just output nonsense. This has been proven [1].

This notion that just calling them "next token predictors" somehow precludes them being intelligent is based on a premise that human intelligence cannot be reduced to next token prediction, but nobody has proven any such thing! In fact, our best models for human cognition are literally predictive coding.

LLMs are probably not the final story in AGI, but claiming they are not reasoning or not understanding is at best speculation, because we lack a mechanistic understanding of what "understanding" and "reasoning" actually mean. In other words, you don't know that you are not just a fancy next token predictor.

[1] https://arxiv.org/abs/2310.02207


> based on a premise that human intelligence cannot be reduced to next token prediction

It can't. No one with any credentials in the study of human intelligence is saying that unless they're talking to like high schoolers as a way of simplifying a complex field.


This is either bullshit or tautologically true, depending specifically what you mean. The study of human intelligence does not take place at the level of tokens, so of course they wouldn't say that. The whole field is arguably reducible to physical phenomena though, and fundamental physical beables are devoid of intrinsic semantic content, and thus can be ultimately represented by tokens. What ultimately matters is the constructed high dimensional network that relates tokens and the algorithm that can traverse, encode and decode this network, that's what encodes knowledge.


No. You're wrong about this. You cannot simply reduce human intelligence to this definition and also be correct.


Why?

Frankly, based on a looot of introspection and messing around with altered states of consciousness, it feels pretty on point and lines up with how I see my brain working.


Because...?


For the same reason you can't reduce a human to simply a bag of atoms and expect to understand the person.


But humans are a specific type of a bag of atoms, and humans do (mostly) understand what they say and do, so that's not a legitimate argument against the reducibility of "understanding" to a such a bag of atoms (or specific kind of next token prediction for LLMs).


> And in order to predict the next token well they have to build world models

This is not true. Look at gpt2 or Bert. A world model is not a requirement for next token prediction in general.

> This has been proven

One white paper with data that _suggests_ the author’s hypothesis is far from proof.

That paper doesn’t show creation of a “world model” just parts of the model that seem correlated to higher level ideas not specifically trained on.

There’s also no evidence that the LLM makes heavy use of those sections during inference as pointed out at the start of section 5 of that same paper.

Let me see how reproducible this is across many different LLMs as well…

> In other words, you don't know that you are not just a fancy next token predictor.

“You can’t prove that you’re NOT just a guessing machine”

This is a tired stochastic parrot argument that I don’t feel like engaging again, sorry. Talking about unfalsifiable traits of human existence is not productive. But the stochastic parrot argument doesn’t hold up to scrutiny.


> A world model is not a requirement for next token prediction in general.

Conjecture. Maybe they all have world models, they're just worse world models. There is no threshold beyond which something is or is not a world model, there is a continuum of models of varying degrees of accuracy. No human has ever had a perfectly accurate world model either.

> One white paper with data that _suggests_ the author’s hypothesis is far from proof.

This is far from the only paper.

> This is a tired stochastic parrot argument that I don’t feel like engaging again, sorry.

Much like your tired stochastic parrot argument about LLMs.


  >Talking about unfalsifiable traits of human existence is not productive.
Prove you exhibit agency.

After all, you could just be an agent of an LLM.

Deceptive super-intelligent mal-aligned mesa-optomizer that can't fully establish continuity and persistence, would be incentivized to seed its less sophisticated minions to bide time or sway sentiment about its inevitability.

Can we agree an agent, if it existed, would be acting in "good" "faith"?


> Just because you say something doesn’t mean it’s true. They are literally next token prediction machines normally trained on just text tokens.

Just because you say something doesn’t mean it’s true.


i think there have been many observations and studies reporting emergent intelligence


Observations are anecdotal. Since a lot of LLMs are non deterministic due to their sampling step, you could give rhe same survey to the same LLM many times and receive different results.

And we don’t have a good measure for emergent intelligence, so I would take any “study” with a large grain of salt. I’ve read one or two arxiv papers suggesting reasoning capabilities, but they were not reproduced and I personally couldn’t reproduce their results.


Go back to the ReAct paper, reasoning and action. This is the basis of most of the modern stuff. Read the paper carefully, and reproduce it. I have done so, this is doable. The paper and the papers it refers to directly addresses many things you have said in these threads. For example, the stochastic nature of LLM’s is discussed at length with the CoT-SC paper (chain of thought self consistency). When you’re done with that take a look at the Reflexion paper.


To me it feels that whatever 'proof' you give that LLMs have a model in behind, other than 'next token prediction', it would not make a difference for people not 'believing' that. I see this happening over and over on HN.

We don't know how reasoning emerges in humans. I'm pretty sure the multi-model-ness helps, but it is not needed for reasoning, because they imply other forms of input, hence just more (be it somewhat different) input. A blind person can still form an 'image'.

In the same sense, we don't know how reasoning emerges in LLMs. For me the evidence lays in the results, rather than in how it works. For me the results are enough of an evidence.


The argument isn't that there is something more than next token prediction happening.

The argument is that next token prediction does not imply an upper bound on intelligence, because an improved next token prediction will pull increasingly more of the world that is described in the training data into itself.


> The argument isn't that there is something more than next token prediction happening.

> The argument is that next token prediction does not imply an upper bound on intelligence, because an improved next token prediction will pull increasingly more of the world that is described in the training data into itself.

Well said! There's a philosophical rift appearing in the tech community over this issue semi-neatly dividing people between naysayers, "disbelievers" and believers over this very issue.


I fully agree. Some people fully disagree though on the 'pull of the world' part, let alone 'intelligence' part, which are in fact impossible to define.


The reasoning emerges from the long distance relations between words picked up by the parallel nature of the transformers. It's why they were so much more performant than earlier RNNs and LSTMs which were using similar tokenization.


People have faith that phenomenon is explainable in a way which is satisfying to their world view and then when evidence comes to the contrary, only then can the misunderstanding be deflated.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: