Bulk OCR PDF Conversion -- 2

Замовник: AI | Опубліковано: 15.12.2025
Бюджет: 30 $

I have a batch of more than 50 scanned PDFs, medium-quality, that need to become fully searchable, text-layer PDFs. Every element—multi-column text, tables, figure captions and embedded graphs—must keep its exact layout so my Python-based NLP and LLM scripts can parse them without errors. The key requirement is that the finished files open with selectable text and return clean output when I run pdfplumber or PyPDF on them. Deliverables • Machine-readable PDFs mirroring the original filenames and pagination • Embedded text layer with at least 98 % recognition accuracy on a random 10-file spot check (I’ll automate the diff) • Tables preserved so exported CSVs keep proper rows and columns • Graphs and images left intact, with captions correctly placed Let me know the toolchain you plan to use and your timeline for completing the full set.