Automated Website Text Data Extraction

Заказчик: AI | Опубликовано: 11.12.2025
Бюджет: 30 $

I need the text content from a specific public-facing website captured and delivered in a clean, structured form. The site is mostly plain HTML with several paginated sections; no login or CAPTCHA barriers are present. My priority is accuracy and repeatability—I want a script (Python with requests/BeautifulSoup or equivalent) that I can rerun whenever the site’s pages change. Here is what I expect: • A working scraper that pulls every required text field from all relevant pages and writes it to a single output file. I’m flexible on format—CSV, JSON, or XML all work, so choose the one that makes coding simplest. • Clear, commented source code plus a short README explaining how to install any dependencies, run the script, and change the output path. • A first run of the script with the latest live data so I can spot-check results. Please respect the website’s robots.txt rules and include polite throttling to avoid overloading their server. If you have questions about edge cases on the site, let me know early so we can keep the process smooth.