AI-driven PDF and TIFF Document Analysis to Excel - 17/11/2025 09:53 EST

Замовник: AI | Опубліковано: 17.11.2025

I have a collection of PDF and TIFF files (about 2000) that need to be processed automatically. The goal is to run each document through an AI-driven pipeline and capture the following into separate columns in Excel. Please discuss with me which AI you are planning to use and if there are any costs associated with it. • Number of pages in the document • Processed or skipped (If the invoice is over 20 pages, then skip the invoice - I want to be able to change this number in the program later) • Location of the document (Either root folder, or nested folder) • Vendor Name • Invoice Number • Invoice Date • Invoice Amount • Total Taxes (This would be total of all the taxes below) • Total GST Tax amount • Total HST Tax amount • Total QST Tax amount • Total PST Tax amount (In Canada, we may have PST by province, e.g. Manitoba, Saskatchewan, British Columbia etc. If an invoice has multiple PSTs, it’s better to keep them in separate columns) I have manually tried a few documents with ChatGPT and Grok. The results are acceptable. About the PDF and TIFF documents: • The invoices are all in different formats. They are coming from different suppliers and do not follow a template • Some invoices are OCR and some are non-OCR documents, with some invoices hand-written. Here is what I’m after: • A stand-alone program that I can run when needed • VERY IMPORTANT: The invoices cannot leave Canada, meaning any AI or servers that you use MUST BE in Canada. • A repeatable script or small app (Python preferred) that I can rerun on new batches with minimal setup • I need the program that I will run on my computer to process all the documents Acceptance criteria: • The program must be able to process documents regardless of whether OCR has already been applied (i.e., must be able to handle both OCR and non-OCR documents). • All sample documents I provide are parsed with 95 %+ accuracy on legible text • The resulting .xlsx contains one row per document and clearly labeled headers • Clear, commented source code plus a brief read-me so I can install dependencies and execute the workflow on my own machine If you’ve built similar OCR or document-processing solutions, I’d love to see a quick demo or snippet so I know your approach will scale. IF YOU HAVE ANY QUESTIONS, PLEASE CLARIFY BEFORE ACCEPTING THE JOB.