Duplicate Voter Analysis Report

I have a set of voter lists supplied to me in CSV format and need a clean, reliable way to spot and document every duplicate voter record. The key identifier is the EPIC ID, but I also want you to cross-check any supporting fields you feel strengthen the match logic (name spellings, date of birth, address fragments, etc.). Your task is two-fold: 1) Run a duplicate-detection routine and generate a comparative analysis that lays out every matched record in full detail. 2) Merge these findings into my existing booth-wise reports, updating both the spreadsheet versions and the accompanying plain-text summaries so they read seamlessly with the new data. I already have a reporting structure, so please follow the same column order, file naming, and booth codes. Where you add commentary, keep it concise and clearly marked. Deliverables • Cleaned master CSV with duplicates flagged or removed • Booth-wise Excel sheets and their matching TXT summaries reflecting the new counts • A separate “Duplicate_Details” file containing the side-by-side record comparison for each match Accuracy is critical; I will spot-check random booths before releasing final approval. If you work in Python, R, or any tool you prefer, note the scripts and steps so I can rerun the process for future updates.

Python

Регистрация