Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Let's build a pipeline

I don't think that is the right approach for archiving. The preferred pipeline would be

all the pdfs -> archive them all -> markdown them

This way you can always re-run the conversion as bugs are fixed and improvements are made. Generally archivist prefer to save as close to the source material as possible, because every transformation from there can only lose data.



Yeah if you get down into the weeds these models are significantly corrupting the source data.

I opened the first example to a random chapter (1.4 Formal and natural languages); within the first three paragraphs it:

- Hallucinated spurious paragraph breaks

- Ignored all the boldfacing

- Hallucinated a blockquote into a new section

This is not a tool to produce something for humans to read.

Maybe it might be useful as part of some pipeline that needs to feed markdown into some other machine process. I would not waste my time reading the crud that came out of this thing.

It's a stunt.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: