PDF & Web Event Data Extraction

Замовник: AI | Опубліковано: 16.10.2025
Бюджет: 8 $

I have four digitally-readable PDFs (about 120 pages each) and one website that list professional events. From these sources I need roughly 10,000 rows of data in total—around 2,000 rows per PDF—plus the web content, each row holding 5–8 key details such as event name, sessions, participants, and dates. What I need you to do: • Programmatically scrape every record from the PDFs and the website. • De-duplicate, fix obvious typos, and keep date formats consistent across the file (any clear, single format is fine). • Compile everything into a single .xlsx file using standard, no-frills formatting—plain headers, tidy columns, ready for filtering and analysis. You may use Python, R, VBA, or any other tool that gets the job done accurately. I will supply the PDFs and the web link as soon as we start; you return one well-organised Excel workbook that I can open and work with immediately. Accuracy is paramount, so please check your output before delivery.