PDF Text Extraction to Documents - 18/10/2025 11:47 EDT

Бюджет: 5000 $

I need the text from a large batch of PDF files exported into clean, separate documents. Only the raw text is required—no need to preserve fonts, layout, or other styling—so each .docx should read like a plain transcript that follows the original reading order without omissions. Scope of work • Run a reliable OCR or direct text‐parser on every PDF (some are image-based, others are selectable). • Produce one .docx for each source file, using the same filename. • Automate the workflow (Python, Tesseract, PyPDF2, ABBYY FineReader, or a tool of your choice) so I can repeat the process later; include a concise setup/usage guide. • Perform spot-check QA and share an accuracy report to demonstrate that all text was captured. My priority is accuracy and a repeatable solution that can handle hundreds of PDFs in one pass. Familiarity with batch processing, OCR tuning, and Word automation will be highly valued.

Python

Реєстрація