Automated News Site Data Scraper

Бюджет: 250 $

I’m looking for an expert who can build a bullet-proof solution that pulls structured text from several online news outlets on a schedule I define. The job is to set up an end-to-end workflow—crawl the pages, extract the article body, headline, author, publication date and canonical URL, normalise that content into JSON or CSV, and drop it straight into a folder or database I point you to. I already have the list of news domains and sample URLs. Your code should: • respect robots.txt and rate limits, • rotate user-agents / proxies if a site blocks frequent requests, • be easy to extend when a new site is added, and • run headlessly from a cron job or similar scheduler. Python with Scrapy, BeautifulSoup, or Playwright is preferred, but I’m open if you can justify another stack. Clear inline comments plus a short README are essential so I can maintain the scraper myself after hand-off. Please include a quick demonstration—scrape five sample articles and provide the resulting JSON so I can verify the field mapping. I’ll consider the project complete when the script runs unattended on my VPS, logs errors cleanly, and captures all required fields from each target site.

Реєстрація