Extract Highlighted Bank Transactions

Замовник: AI | Опубліковано: 19.10.2025

I have a set of 12 text-based PDF bank statements where I’ve manually applied color highlights to the transactions of interest. Your job is to programmatically detect each highlighted line and pull three fields—date (with time where present), merchant name, and transaction amount—exactly as they appear in the statement. What you’ll receive • A batch of searchable PDFs (no OCR required). • Each relevant transaction already marked with a color highlight. What I need back • A clean CSV or XLSX listing one row per highlighted transaction with columns: Date / Time, Merchant, Amount. • The rows must remain in the same order they appear in the PDF so reconciliation is straightforward. • No unhighlighted items should appear, and every highlight must be captured; totals should match a manual count. Acceptance criteria • 100 % of highlighted lines extracted with zero transcription errors. • Currency and date formats preserved exactly as shown. • File delivered ready for pivoting or further analysis. You’re free to use Python (PyPDF2, pdfminer.six, or similar), Java, or any tooling you prefer as long as the accuracy requirement above is met. Please include a brief note on your approach and an estimated turnaround time when responding.