PDF urls web scraping

Бюджет: 30 $

I need someone to help me crawl U.S. government websites (domains ending in .gov) and collect all PDF file URLs—for example: https://www.irs.gov/pub/irs-pdf/f1099msc.pdf. You must understand how web scraping works. The plan is to break the U.S. into 50 states, then use a tool like ChatGPT to generate a list of .gov domains for each state. After we collect those domains, you will run a script to crawl each domain and extract all PDF URLs.

Python

Реєстрація