Offline Bulk PDF OCR Software

Замовник: AI | Опубліковано: 25.11.2025

I need a Windows-based utility that lives entirely offline yet turns standard PDFs into fully searchable PDFs at high speed. The idea is simple: I select a root folder once, set any optional conversion parameters, then let the program crawl every sub-folder, convert every PDF it finds, and place the OCRed version back into an identical folder structure. No Internet calls, no cloud dependencies—everything must execute locally and take full advantage of multi-core processors or GPU acceleration to keep throughput high. Interaction level: semi-automated. Beyond choosing the folder and (if I wish) adjusting language or output quality, the rest should run unattended, showing progress and logging results for later review. A minimal, clean GUI is fine; command-line switches alongside the GUI would be even better for scripted runs, but the critical requirement is that folder selection drives the batch. I am open to your preferred stack—C#, C++, Python with Tesseract, or a licensed OCR SDK—as long as the licence permits offline redistribution and performance targets are met. Deliverables • Windows 10/11 installer or portable executable • Source code and build instructions • Simple interface for folder selection plus optional settings (language, DPI correction, overwrite/rename mode) • Robust logging (success, fail, time, pages processed) • Output that preserves the original directory tree and file names • Demonstrated throughput significantly faster than single-threaded Tesseract on the same machine Acceptance criteria 1. Pointing the tool at a test directory containing nested folders of mixed PDFs converts every file to a searchable PDF without altering originals. 2. Text layer is selectable and copyable in standard viewers (Adobe Reader, Chrome). 3. Conversion speed meets or exceeds the benchmark agreed during kickoff. 4. All functionality works without an Internet connection. If you have proven experience building fast OCR systems for Windows, I’d like to see a brief note on your chosen approach and expected performance before we start.