Website Content Scrape to CSV

I need all visible text content pulled from a single website and delivered in a clean, well-structured CSV file. This is a one-time scrape, so the script does not need to run on a schedule; it just needs to collect every page’s copy accurately and store each page URL, headline, sub-headline, paragraph body, and any inline text in separate columns. Please make the scraper resilient to common roadblocks such as pagination, lazy-loaded sections, and basic anti-bot measures, and keep the code modular so I can rerun it myself if the site layout changes slightly. Python with BeautifulSoup, Scrapy, or Playwright is fine as long as the final CSV is UTF-8 encoded and free of HTML tags. Quantities: - we expect somewhere between 10.000 and 70.000 records - we want to pay in milestones per 5,000 - we want to pay for research work + first 5000 in the first milestone, other amount for following milestones (in case you get blocked, problems arise) Deliverables • Scraper source code with brief usage notes • The compiled CSV containing all text content • A short read-me confirming page count and any pages skipped (if any) I will consider the job complete once the CSV opens without errors and spot-checks match the live site word-for-word.

Python

Регистрация