I will perform the scraping myself. I need an experienced Python developer to **create the scraping/enrichment scripts, set up proxies, automate the workflow, add monitoring, and train me to run and manage everything**. The full engagement must be completed in **15 calendar days**. I want a hands-on developer who will deliver production-ready, well-documented code and then teach me how to run, troubleshoot, and maintain the system. --- ### Scope of Work / Responsibilities The developer will: 1. **Write modular scraping & enrichment scripts** (Python): * Provide scripts for parsing static pages (Scrapy or requests/BeautifulSoup) and for dynamic pages where needed (Playwright or Puppeteer). * Provide an **Impressum (imprint) extraction** script to fetch missing contact details from company sites. * Include URL queueing, rate limiting, retry logic, and per-domain politeness. 2. **Proxy setup & management** * Recommend Germany-targeted proxy providers (residential/ISP/mobile) and advise on plans. * Integrate proxy rotation into scripts with health checks (remove bad IPs). * Implement per-domain concurrency limits and backoff logic. 3. **Automation, orchestration & storage** * Containerize scripts (Docker) and provide simple run scripts / docker-compose. * Provide a minimal job queue or scheduler (Redis + Celery or cron-based solution). * Integration with a Postgres (or MySQL) database schema for ingestion; provide SQL for tables. * Store raw HTML snapshots (S3/Hetzner Object Storage) and merged/cleaned records in DB. 4. **Change detection & bandwidth savings** * Implement HEAD/conditional GET, ETag/Last-Modified checks, and content-hash logic. Skip unchanged pages. 5. **Monitoring & logging** * Provide basic logging and a simple monitoring dashboard or instructions (Prometheus/Grafana optional, or simpler: Prometheus+Grafana / plain log + alerting). * Track GB usage (proxy provider), request success/fail stats, queue lengths, and last-run summaries. 6. **Documentation & training** * Provide clear, step-by-step documentation (README, runbook) on how to deploy and run the system. * Conduct **live training sessions (video calls)** totalling **at least 3 hours** to teach me: * How to run scrapers, * How to manage proxies, * How to monitor and interpret logs/metrics, * How to troubleshoot common failures. * Provide short screencast recordings of the training (optional but preferred). 7. **Handover & support** * One week of post-delivery support via chat/email for bugfixes related to the deliverables. --- ### Deliverables * Production-ready Python scripts (clean, commented). * Dockerfile(s) and docker-compose (or deployment instructions). * DB schema (Postgres recommended) + sample migration script. * Proxy integration code and recommended provider & sample config. * Monitoring/logging setup or instructions. * README + runbook + troubleshooting guide. * Video-call training (≥3 hours) + recordings. * One week post-delivery support. --- ### Technical preferences (ideal) * Python (Scrapy + Playwright preferred) * PostgreSQL * Docker * Redis + Celery (or an equivalent lightweight queue) * Familiarity with proxy rotation, anti-bot measures, and German website patterns is a strong plus. --- ### Timeline (must meet) * Total time: **15 calendar days from project start** * Suggested milestone plan (flexible to discuss): * Days 1–5: Core scraping & enrichment scripts + DB schema. * Days 6–9: Proxy integration, rotation & health checks. * Days 10–12: Automation (Docker), change-detection, logging. * Days 13–14: Monitoring setup, documentation, test runs. * Day 15: Training sessions + handover and delivery of recordings. --- ### Application requirements When you apply, include: * Brief summary of relevant experience (3–5 lines). * Examples/links to previous scraping or automation projects (GitHub or portfolio). * Confirmation you can meet the **15-day** deadline and provide the training. * Your estimated cost and proposed milestone payments. * Preferred time zone and availability for the training calls within the 15-day window.