High-Scale Amazon Scraper Build

I need a production-ready server-side scraper that can pull product details—name, price, description, and current stock—from Amazon and a handful of comparable e-commerce sites. The crawler must comfortably handle millions of SKUs and rerun itself on an automated schedule every 6–8 hours without manual intervention. Here is what I’m aiming for: • Scraping engine written in a language well suited to concurrency (Python + Scrapy, Playwright, or another proven framework) • Smart rotation of proxies, user-agents, and headless browsers so rate limits and bans are avoided • Output in clean CSV files that the job then pipes directly into my PostgreSQL instance • Robust logging and alerting so I can quickly spot failed requests, IP blocks, or schema mismatches • The ASINs to be scan arrives from my app via API every time a user ask to scrape Acceptance criteria 1. A fresh run against a test list of 50 000 ASINs completes in under two hours. 2. At least 98 % of requested records return all required fields. 3. All CSVs load into PostgreSQL with zero type conflicts or duplicate keys. 4. The scheduler triggers automatically every 6–8 hours and writes success/failure to log. 5. Clear documentation lets me redeploy or scale the scraper to additional servers on my own. Once the initial build is live, I’ll retain you for ongoing maintenance and incremental feature work, so I’m looking for someone who enjoys long-term collaboration as much as clean code. If this sounds like your kind of challenge, let’s talk.

Python

Реєстрація