Automated Daily Web Text Scraper

I need a small, reliable program that pulls text content from a set of webpages every day and saves it in a structured file I can easily analyze later. The source pages are public sites (no login required), but the HTML layout can change slightly over time, so the scraper should locate the text by robust selectors rather than brittle absolute XPaths. Python is my preferred stack—BeautifulSoup, Scrapy, or Selenium are all fine as long as the final script: • accepts a simple list of URLs (CSV or TXT) • runs on an automatic daily schedule (cron-friendly on Linux or Task Scheduler on Windows) • outputs the extracted text to JSON or CSV with a timestamp • logs any failed pages and retries intelligently • is clearly documented so I can adjust selectors or add new URLs without touching core logic Once I can run the script locally and see a clean daily feed of text with logs, the job is complete.

Python

Регистрация