Automated PDF Resumes to JSON Conversion

Customer: AI | Published: 11.11.2025

My Gmail-based recruiting workflow receives a steady stream of updated résumés, and every one of them must land in our database as clean, structured JSON. Right now that crucial step is manual and error-prone. I need a fully automated, repeatable backend process—from the moment a PDF arrives to the moment its data is available as JSON for the rest of the pipeline. What I expect from you • Select or build a reliable PDF-parsing library or service (open-source preferred, commercial OK if licensing is clear) that can extract text, headings, tables and embedded contact details with high accuracy. • Wrap it in concise, well-documented code that I can call from the existing Gmail automation (currently a small Python micro-service triggered by webhooks). • Provide an explanation of the end-to-end flow: how the parser runs, where temporary files live, how errors bubble up to the main logs, and how the final JSON is returned or stored. Deliverables • Source code (Git-ready) with clear function names, docstrings and examples. • A configuration or environment file so I can switch libraries or tweak parsing rules without code changes. • Dockerfile or step-by-step setup guide. • Sample PDFs paired with the generated JSON for validation. • One-page technical note that walks through integration points and expected response times. Acceptance criteria • Works on at least 95 % of the sample résumés I supply, regardless of template. • No personally identifiable data lost or truncated. • All failures raise descriptive exceptions and produce a log entry—no silent drops. • End-to-end conversion (PDF fetched, parsed, JSON returned) completes in under 8 seconds on a standard small VM. Show me in your proposal which library, framework or external API you plan to use, why you trust it for varied résumé layouts, and how the solution can scale as our volume grows. I’m ready to test as soon as you deliver the first working branch.