While we're talking about the Google Cloud Vision API I'll take the opportunity to plug the Chrome extension I wrote that adds a right-click menu item to detect text, labels and faces in images in your browser:
At work I replaced a [Tesseract](https://github.com/tesseract-ocr) pipeline with some scripts around the Cloud Vision API. I've been pleased with the speed and accuracy so far considering the low cost and light setup.
Btw, here is a Ruby script that will take an API key and image URL and return the text:
The accuracy is about the same. We process store circular images which are actually pretty easy to OCR. It helps that we have large images to start with and are converted to grayscale and then edge sharpened in imagemagick before being sent to the OCR process.
Submitter: If you're also the author, thank you for sharing your efforts. I needed exactly this kind of information to improve protection against cp spammers who had switched to posting images with the urls on one of my websites. I had however not been able to find out how to start using ocr apis, so this is a god send.
This was useful information. Testing this was on my todo list for weeks now:
I read about these limitations in the Cloud Vision OCR API docs, but could not believe that they would indeed not provide data at the word or region level. Anyone has any idea why?
I mean, they must have this data internally and it is key for useful OCR.
I was recently testing out google's OCR for some PDF docs - it thought it worked really well (and is pretty reasonable priced). i didnt care so much about the structure of the response/document.
@danso, if there are any delimiters in the output (tesseract case) and you are looking for automatic table extraction, check out http://github.com/ahirner/Tabularazr-os
It's been used with different kinds of financial docs such as municipal bonds. Implemented in pure python, it has a web interface, simple API and does nifty type inference (dates, interest rate, dollar ammounts...).
Very cool, thanks for sharing. I'm guessing it doesn't do OCR yet? FWIW, you may be interested in these similar projects, which are popular in the journalism community though they don't provide the same high-level interface or data-inference, just the PDF-to-delimited text processing:
OCR is left out as a possible future extension, which is why I got interested in this comparison. Thanks, I didn't know about pdfplumber! The utilization of additional markup like vertical lines from pdfminer is very interesting. Razr uses poppler tools with text-only conversion but from which it automatically extracts column names and types.
Similar to plumber and opposed to Tabula, the goal was to extract tables from a swath of documents without user intervention. Additionally, no knowledge about the location tables in the document is required. A fully automated workflow would curl -X POST localhost/analyze/... and filter down the json to the type or types of tables needed (via context lines, data types, column headers).
While we're talking about Google Cloud Vision API, I'll take the opportunity to present the simple web interface to detect labels, text, landmark, faces, logo, etc, using Vision API:
Basically, about 2 seconds for the road signs photo. 6+ seconds for the spreadsheet image (with occasional timeouts). So, probably not optimized/ideal for reading large amounts of text
We recently did some testing of Google's OCR vs Abbyy. Google is much better than Abbyy and is cheaper. Abbyy fails at more complex fonts like script while Google still performs well.
https://chrome.google.com/webstore/detail/cloud-vision/nblmo...
Try it out, let me know what you think. File issues at github.com/GoogleCloudPlatform/cloud-vision/