> It isn't ruminating over, say, a five word sequence and then outputting five w...

> It isn't ruminating over, say, a five word sequence and then outputting five words together at once when that is settled.

True, and it's a good intuition that some words are much more complicated to generate than others and obviously should require more computation than some other words. For example if the user asks a yes/no question, ideally the answer should start with "Yes" or with "No", followed by some justification. To compute this first token, it can only do a single forward pass and must decide the path to take.

But this is precisely why chain-of-thought was invented and later on "reasoning" models. These take it "step by step" and generate sort of stream of consciousness monologue where each word follows more smoothly from the previous ones, not as abruptly as immediately pinning down a Yes or a No.

But if you want explicit backtracking, people have also done that years ago (https://news.ycombinator.com/item?id=36425375).

LLMs are an extremely well researched space where armies of researchers, engineers, grad and undergrad students, enthusiasts and everyone in between has been coming up with all manners of ideas. It is highly unlikely that you can easily point to some obvious thing they missed.