PDF to JSON Data Extraction - 21/12/2025 23:41 EST

Бюджет: 250 $

(Only US freelancers apply) I have a growing archive of PDF files generated from our internal system, each following a consistent, form-like layout packed with tables and keyed fields. I need those PDFs parsed automatically, the TEXT and TABLE content lifted with full fidelity, converted into clean JSON, and then pushed into an endpoint on our custom web application. The PDFs never include scanned images—everything is digitally generated and therefore “structured.” I’m looking for a straightforward pipeline: open the PDF, detect the form fields and tabular regions, map every column/field to a JSON key, validate the output against a simple schema, and hit our upload API. Python with pdfplumber, Tabula, Camelot, or a comparable library makes sense here, but I’m open to other reliable tools if you can prove accuracy and speed. Deliverables: • A reproducible script or micro-service that ingests individual PDFs or a folder, extracts structured data (text + tables) and produces well-formed JSON. • Configuration or mapping file so future layout tweaks are handled quickly. • One-click or command-line uploader that posts the JSON payloads to our web app’s REST endpoint, handling auth and basic error logging. • Brief README and sample output from three test PDFs so I can verify the mapping. I’ll provide sample documents, field definitions, and API credentials once we start. If you’ve built PDF-to-JSON pipelines before, tell me which libraries you prefer and your typical turnaround time.

Python

Регистрация