> But most of the training corpus was not linguistically-sound gibberish; it was...

> But most of the training corpus was not linguistically-sound gibberish; it was actual text. There was knowledge encoded in the words. LLMs are large language models - large enough to have encoded some of the knowledge with the words.

>Some of. And they only encoded it. They didn't learn it, and they don't know it. It's just encoded in the words. It comes out sometimes in response to a prompt. Not always, not often enough to be relied on, but often enough to give users hope.

And some of the pieces of "knowledge" in the training corpora were wrong, lies, or bullshit themselves.

Garbage in, garbage out.