PDF Invoice Data Extraction Tool

Замовник: AI | Опубліковано: 05.11.2025
Бюджет: 750 $

I have several hundred invoice PDFs sitting in one folder and a companion Excel workbook that already lists data of the above mentionded invoices (including invoice number, supplier name, amounts, dates, etc). What I now need is a small program that will open every PDF, read the key details, and append them as new columns in the same spreadsheet. The data to be captured is: • PDF name • Invoice number • Selling date • Issuance date • Net amount • Gross amount • Supplier name • Full item description Once extracted, amounts and any other figures must be written into Excel as true numeric cells so later calculations work without manual re-formatting. Dates should land in native date format, while text such as supplier and description can remain as strings. Core behaviour I would like: • Scan the target folder in one click (or command line), process every PDF, and match entries to the existing rows by invoice number. • Add the new columns without disturbing anything already in the sheet. • Skip or flag unreadable files so I can review them manually later. • Simple configuration of folder paths and workbook name—no elaborate GUI necessary. Python with pdfplumber/PyPDF2 plus openpyxl or pandas is perfectly fine, but I am open to C#, Java, or any other stack you find efficient and maintainable. Deliverables: 1. Well-commented source code. 2. A short README explaining setup and usage. 3. A demonstration run on a small sample set showing the numeric columns correctly populated and dates recognised by Excel. If you have built something similar or can turn this around quickly, let me know how you plan to extract the text reliably and handle edge-cases such as scanned PDFs.