AI Tool for Duplicate Invoice Detection

Замовник: AI | Опубліковано: 08.02.2026
Бюджет: 250 $

I’m sitting on a few hundred PDF invoices and need an automated way to spot any that were issued twice. Because our numbering scheme uses custom prefixes and special characters, I’d like the detection to rely solely on customer-specific data: the vehicle card number, the number of litres dispensed, and the exact date-time of tanking. Here’s what I’m after: an AI-powered script or lightweight app that scans every PDF, extracts those three fields with high accuracy, and then flags, groups, and reports any duplicates it finds. A clear CSV or Excel file listing each suspected duplicate (together with a confidence score and page reference) will be enough for me to review and act on. Acceptance criteria • All PDFs processed automatically—no manual renaming or sorting • ≥95 % accuracy on the three target fields • Duplicates grouped logically and exported in tabular form • Re-run capability for future batches with minimal setup • Well-commented code and a short README explaining dependencies and usage Python feels natural here—pdfplumber or PyPDF2 for parsing, Tesseract or similar OCR when needed, pandas for the comparison logic—but I’m open to whatever stack delivers the results. The key is reliability and ease of rerunning the process whenever fresh invoices land in my folder.