Xerox is cool but I'd have proposed another analogy. Suppose you need to transfe...

Xerox is cool but I'd have proposed another analogy. Suppose you need to transfer your valuable knowledge to the next generation, but you don't have any durable medium, nor widespread literacy, for this matter. On the other hand, you have respect and the attention of the youth. So you encode the most important parts into an epic poem, and you try to get your students to memorize it. You can't know for sure that it won't mutate after you're not there any more – and indeed, it will; odds are, you are only passing what you've heard yourself, as well as you can, already with some embellishment and updates.

For the bigger part of our history, we haven't had access to lossless transmission of substantial information. We still don't for many cases that matter most – any verbalized opinion can be recorded for all eternity, but is that really what you know, and are you sure that's the best way to pass it on? Experts die and not infrequently take their know-how and unique knacks with them, even as they've shared millions of imperishable words with the rest of us - but sometimes their students make progress in their own ways. In fact, greats like Socrates believed that writing is bad precisely because it offers us an easy hack for substitution of understanding with lossless recall. [1]

Lossy learning is just the normal mode of human learning; lossy recall is our normal way of recall. It's not a gimmick, nor a way to show off originality.

> Perhaps arithmetic is a special case, one for which large-language models are poorly suited. Is it possible that, in areas outside addition and subtraction, statistical regularities in text actually do correspond to genuine knowledge of the real world? > I think there’s a simpler explanation.

The original explanation is the simpler one. Consider any run-of-the-mill error of arithmetic reasoning by ChatGPT, e.g. in [2]:

> Shaquille O'Neal is taller than Yao Ming. Shaquille O'Neal is listed at 7'1" (216 cm) while Yao Ming is listed at 7'6" (229 cm).

Madness of course. But if we consult with the OpenAI tokenizer[3], we'll see that this is a yet another issue of BPE encoding. '216' is a single token [20666], and '229' is the token [23539] – those are not ordinal values but IDs on the nominal scale of token alphabet. '2' '21', '29' are [17], [1433] and [1959] respectively. While we're at it, 'tall' is [35429] whereas 'Tall' is two tokens, [51, 439]. Good luck learning arithmetic robustly with this nonsense. But it may well be possible to learn how to make corny metaphors – this is just a more forgiving arena.

> If the output of ChatGPT isn’t good enough for GPT-4, we might take that as an indicator that it’s not good enough for us, either.

Or we might think a bit about the procedure of RLHF and understand that these models are already intentionally trained with their own output. This scene is moving fast.

I think the lesson here, as pointed out by one of the top comments, is that the culture of literary excellence is indeed at risk; but mainly because it's so vastly insufficient to provide even shallow domain understanding. Writing well, mashing concepts together, is worth nothing when it can be mass-produced by language models. Actually investigating the domain, even when you feel it's beneath you, is the edge of human intelligence.

1: https://fs.blog/an-old-argument-against-writing/

2: https://www.searchenginejournal.com/chatgpt-update-improved-...

3: https://platform.openai.com/tokenizer