Bot-Resistant E-Commerce Scraper

I need a robust, self-hosted scraping tool that can reliably collect product details and customer reviews from Amazon, Walmart, Lowes, Home Depot, Leroy Merlin, and Elektro Wandelt. The crawler must run once a month without triggering the anti-bot systems these sites use, so smart throttling, proxy rotation, user-agent randomisation, and solid CAPTCHA handling are essential. I will trigger the run manually (CLI or simple dashboard is fine) and receive structured output—CSV and JSON are both acceptable—with clear field names for SKU, title, price, availability, rating, review text, and timestamps. Source code should be clean Python (Scrapy, Playwright, Selenium, or a comparable stack), containerised for easy deployment, and accompanied by a short README that explains setup, proxy configuration, and how to add new domains later. Deliverables • Full source code in a Git repo • Dockerfile or environment file for repeatable builds • README covering installation, usage, and extension • One successful test run showing sample data from each target site I will consider the project complete once the tool finishes a full monthly scrape of all listed sites without captchas or blocks and the dataset matches the agreed-upon fields.

Реєстрація