Marathi Voter PDF Data Extraction

Customer: AI | Published: 12.12.2025

I have several thousand voter-list PDFs written entirely in Marathi. I need a reliable workflow—script, desktop utility, or small web app—that will scan every page, recognise the Marathi text correctly, and export each voter’s details to an Excel spreadsheet. The fields I must see in separate columns are: • Name and address • Voter ID / EPIC number with any contact details that appear • Date of birth and gender • Age Accuracy is critical; diacritics and vowel signs in Marathi must not be lost or misread. I will provide sample PDFs so you can tune your OCR or pattern-matching logic. The final deliverable is: 1. An Excel file containing every voter record, perfectly aligned to the columns above. 2. The complete, well-commented source code or repeatable process so I can rerun it on future PDF batches. Please ensure the solution works end-to-end on bulk data (thousands of pages) without manual intervention once started, and that it can be executed on a standard Windows machine or through a simple Python environment I can set up myself.