Cloud Python Service Directory Scraper

Бюджет: 750 $

I need a Python-based scraper that targets service-provider directory sites and lets me choose both the service category and any U.S. ZIP code before it runs. Once the crawl completes, it should hand me a clean CSV that contains every business’s name, descriptive text, logo or main image (saved as a link or base-64, whichever is lighter), full website URL and any email address it can reliably pull. I’ll supply the cloud account, so build the script to run headlessly on whatever platform I point you to (AWS, GCP or Azure—your choice, as long as you include a one-command deploy script and clear README). The scraper has to cope with pagination, lazy-loaded images and basic anti-bot measures without hammering the target site. If a directory blocks scraping via robots.txt, the tool should skip it gracefully and log the reason. Deliverables • Fully documented Python code (PEP-8 compliant) • Requirements.txt plus a Dockerfile or equivalent for cloud launch • Deploy instructions that assume only CLI access to the cloud instance • Sample CSV proving the fields export correctly with at least one ZIP/service run Acceptance criteria: run a test on a well-known service directory, pass in “plumbing” and a sample ZIP, and the resulting CSV must contain at least 25 unique records with valid emails or an explicit “email_not_found” placeholder. Future expansions to JSON or XML output are possible, so structure the code with that in mind, but for now the mandatory export is CSV. Data example needed scraped from websites https://docs.google.com/spreadsheets/d/1HUUV9rEDHlTXuuH_cdO770TVI7r71auIZsfvlKi2lco/edit?usp=sharing

Registration