> Let's build a pipeline I don't think that is the right approach for archiving....

crotchfire · on Dec 4, 2023

Yeah if you get down into the weeds these models are significantly corrupting the source data.

I opened the first example to a random chapter (1.4 Formal and natural languages); within the first three paragraphs it:

- Hallucinated spurious paragraph breaks

- Ignored all the boldfacing

- Hallucinated a blockquote into a new section

This is not a tool to produce something for humans to read.

Maybe it might be useful as part of some pipeline that needs to feed markdown into some other machine process. I would not waste my time reading the crud that came out of this thing.

It's a stunt.