Show HN: DocStrange: An LLM-Ready Data Platform That Performs Better Than Gemini

souvik3333 · 2025-10-15T12:13:18 1760530398

We have developed DocStrange to create LLM-ready data from images and PDFs. We have open-sourced a 3B finetuned model also. You can try both the open-sourced and private models from the demo.

HF: https://huggingface.co/nanonets/Nanonets-OCR2-3B Demo: https://docstrange.nanonets.com/ Blog: https://nanonets.com/research/nanonets-ocr-2/

This model is an improvement over our last open-source model. We have fixed some of the issues that the community faced and some of the features that were requested (handwritten, multi-lingual).

The models are trained on 3 million documents, including handwritten documents, financial reports, complex tables, documents with watermarks, and stamps. Feel free to try it and share feedback.

AdityaNahata · 2025-10-15T12:14:47 1760530487

Do you guys provide api support also? I am processing documents for a project

souvik3333 · 2025-10-15T12:15:59 1760530559

Yeah, we do have api support. Currently, you can process 10k documents per month free. Let me know if you face any issues.

ashish2091 · 2025-10-15T12:20:21 1760530821

Is it better than gemini pro or flash? do you have any benchmarking data? I want to use it for markdown from scanned pdfs.

souvik3333 · 2025-10-15T12:23:14 1760530994

We have evaluated against Gemini-2.5-flash. You can check the benchmarks here https://nanonets.com/research/nanonets-ocr-2/#markdown-evalu...