AI Invoice Data Extractor Script

Customer: AI | Published: 20.11.2025
Бюджет: 750 $

I need a browser-based script that lets me drop a PDF, JPG or PNG invoice into a simple web form and instantly returns a clean JSON payload with all key details. Accuracy is critical: the extraction must be reliable enough for downstream accounting automation, so please combine smart template-free parsing with solid OCR to reach a very high recognition rate. Scope of the data to pull • Seller information – company name, address, phone, VAT/Tax ID, plus any email or website you can confidently detect. • Line-item table – product or service description, quantity, unit price (HT), VAT rate, and each row’s totals. • Summary – subtotal, discounts, total VAT, grand total (TTC), currency. Key constraints • Language: invoices are primarily in French, so numeric formats (comma decimals, € symbol, “TTC”, etc.) must be handled gracefully. • Input channel: files arrive only through an upload field on the page or API. • Handle invoices with different layouts (not only one fixed template). • Output: a single well-structured JSON object I can consume directly in my codebase. Deliverables 1. Web-based script (can be: PHP, Python (Flask/Django/FastAPI), or Node.js – I’m open to suggestions. 2. Minimal HTML upload form and a results screen showing the raw JSON. 3. README that explains any third-party libraries, model training steps (if used), and how to extend to additional languages later. 4. Test run on a small set of sample French invoices I’ll provide; success is 95 %+ correct field population across that set. Feel free to leverage tools like Tesseract, pdfplumber, PyMuPDF, PaddleOCR, spaCy, or even lightweight deep-learning models—whatever gets us the accuracy we need while keeping deployment straightforward. I’m happy to answer technical questions quickly so you can focus on clean code and solid extraction logic.