Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sounds like a potentially useful improvement then.

I've had more success exporting text from some PDFs (not scanned pages, but just text typeset using some extremely cursed process that breaks accessibility) that way than via "normal" PDF-to-text methods.





no, it is not. simple ocr is slow and much more expensive than an api call to the given process. on the positive side, it is also error prone and cannot follow the focus in real time. no, adding ai does not make it better. AI is useful when everything else fails and it is word waiting 10 seconds for an incomplete and partially hallucinated screen description.

> simple ocr is slow

Huh? Running a powerful LLM over a screenshot can take longer, but for example macOS's/iOS's default "extract text" feature has been pretty much instant for me.


is "pretty much instant" true when jumping between buttons, partially saying what you are landing on while looking for something else? can it represent a gui in enough detail to navigate it, open combo boxes, multy selects and whatever? can it make a difference between an image of a button and the button itself? can it move fast enough so that you can edit text while moving back and forth? ocr with possible prefetch is not the same as object recognition and manipulation. screen readers are not text readers, they create a model of the screen which could be navigated and interacted with. modern screen readers have ocr capabilities. they have ai addons as well. still, having the information ready to serve in a manner that allows followup action is much better.

Oh, I don't doubt at all that it's a measure of last resort, and I am indeed not familiar with the screen reader context.

I was mostly wondering how well my experience with human-but-not-machine-readable PDFs transferred to that domain, and surprised that OCR performance is still an issue.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: