Automated News Site Data Scraper

Замовник: AI | Опубліковано: 09.04.2026
Бюджет: 250 $

I’m looking for an expert who can build a bullet-proof solution that pulls structured text from several online news outlets on a schedule I define. The job is to set up an end-to-end workflow—crawl the pages, extract the article body, headline, author, publication date and canonical URL, normalise that content into JSON or CSV, and drop it straight into a folder or database I point you to. I already have the list of news domains and sample URLs. Your code should: • respect robots.txt and rate limits, • rotate user-agents / proxies if a site blocks frequent requests, • be easy to extend when a new site is added, and • run headlessly from a cron job or similar scheduler. Python with Scrapy, BeautifulSoup, or Playwright is preferred, but I’m open if you can justify another stack. Clear inline comments plus a short README are essential so I can maintain the scraper myself after hand-off. Please include a quick demonstration—scrape five sample articles and provide the resulting JSON so I can verify the field mapping. I’ll consider the project complete when the script runs unattended on my VPS, logs errors cleanly, and captures all required fields from each target site.