Bulk PDF Data to Excel

I have 226 individual PDF files, each holding roughly 1,000 personal records. Every record contains a Serial Number, Name, Father’s Name, Husband's name, Mother's name, House Number, Age, Gender and a Unique Identification Number. Your task is to move every line of information into a single Excel workbook, creating one separate sheet for every PDF. Please keep the structure straightforward: each sheet must display those seven details as distinct columns. The PDFs are mostly consistent but not perfectly, so whenever a field appears in an unexpected position or format, place it in its own column on that same sheet rather than trying to force a global standard. I want the raw data faithfully reproduced, not reshaped. You may use any reliable combination of tools—Python (Tabula, Camelot, PyPDF), Power Query, Adobe, ABBYY or meticulous manual entry—so long as the final spreadsheet is accurate and characters are preserved exactly as they appear. I have uploaded a sample PDF and a matching sample sheet to illustrate the layout I expect. Deliverables • A single .xlsx workbook containing 226 clearly named sheets, one per source PDF • All seven fields captured in column form on each sheet, with any layout variations retained in additional columns where necessary • The only blank column or exception will be Father's Name or Mother's name or Husband's name (only one of it will be filled for each individual) • No missing rows, no merged cells, no forced re-ordering of data Let me know your method of approach and the estimated turnaround once you have reviewed the samples.

Python

Регистрация