Very cool, thanks for sharing. I'm guessing it doesn't do OCR yet? FWIW, you may...

alex_hirner · on March 25, 2016

OCR is left out as a possible future extension, which is why I got interested in this comparison. Thanks, I didn't know about pdfplumber! The utilization of additional markup like vertical lines from pdfminer is very interesting. Razr uses poppler tools with text-only conversion but from which it automatically extracts column names and types.

Similar to plumber and opposed to Tabula, the goal was to extract tables from a swath of documents without user intervention. Additionally, no knowledge about the location tables in the document is required. A fully automated workflow would curl -X POST localhost/analyze/... and filter down the json to the type or types of tables needed (via context lines, data types, column headers).