PDF Financial Report Automation

Budget : 3k -5k : architecture is alredy created !!!URGENT support your job is to work and cunsult me and my team I have a full-length company financial report that arrives only as a PDF file and I want to turn it into something my local LLM (running through Ollama) can understand and answer questions on with reliable accuracy. The goal is a hands-off pipeline: I drop a fresh PDF into a folder, run a command, and then query the model for any figure—whether it sits in the balance sheet, income statement, or cash-flow section—and get a clean, correct response every time. What I need built • A script (Python preferred) that parses the PDF, captures every table and key figure, and outputs a structured data store (CSV, JSON, or SQLite—whatever best supports downstream use). • Validation logic that cross-checks totals so obvious extraction errors are caught automatically. • An indexing or embedding step that wires the cleaned numbers and text into my on-prem Ollama instance, allowing natural-language questions such as “What was EBITDA for 2023?” or “How did operating cash change quarter-over-quarter?” • Clear, offline-friendly documentation plus a brief demo confirming the system answers a supplied test set accurately. Environment details The server runs Linux with Python 3.11 and the latest Ollama build; libraries like pdfplumber, Camelot, pandas, LangChain, LlamaIndex, or similar are all acceptable as long as they install via requirements.txt and run fully offline. Deliverables 1. Complete, well-commented source code and requirements.txt 2. Setup guide and usage examples (README.md) 3. Recorded or live demo session showing ≥95 % extraction accuracy and correct answers on 20 validation questions drawn from the report If you have prior experience marrying PDF parsing with local LLMs or similar RAG workflows, I would love to see it. I’m ready to start as soon as we agree on an approach.

Registration