n8n Lead Scraping Workflows

Бюджет: 1500 $

*** YOU MUST INCLUDE "ROBOT" IN THE FIRST SENTENCE OF YOUR REPLY TO BE CONSIDERED. YOUR REPLY WILL BE DELETED OTHERWISE!!! *** I need four separate workflows built in n8n that automatically scrape business-directory websites and pull fresh contact information—emails and phone numbers only—into my lead pipeline. The attached document spells out the four target directories and field mappings, so you can mirror their individual page structures and pagination rules exactly. Because each site has its own layout, I expect four independent n8n flows that I can import and run without further tweaking. Please configure the usual HTTP Request / HTML Extract (or Cheerio) nodes, handle basic anti-scraping blocks where possible, and keep the logic clear so I can adjust selectors later if the sites change. *** YOU MUST INCLUDE "ROBOT" IN THE FIRST SENTENCE OF YOUR REPLY TO BE CONSIDERED. YOUR REPLY WILL BE DELETED OTHERWISE!!! *** Deliverables: • Four n8n workflow export files (.json), one per directory • Comments inside the nodes explaining each step • A brief README outlining any environment variables or credentials I need to set before hitting “Activate” A run that fetches at least a small sample of valid email and phone entries from each directory will serve as acceptance that everything is wired correctly. Web Data Collection and Rate-Limit Compliance This role involves building and maintaining automated data collection workflows. Scraping is fragile and improper configurations can break targets or trigger IP reputation issues that may blacklist our n8n server. The engineer must: Use official APIs whenever available. Scraping is a last resort. Respect each site’s robots.txt, terms of use, and crawl etiquette. Implement strict rate limits per domain with jittered scheduling, backoff, and circuit breakers on 4xx and 5xx responses. Handle 429 and 403 responses with exponential backoff and automated cooldowns. Maintain per-site crawl budgets, user agent rotation, and session hygiene without evasion tactics that violate site rules. Monitor IP reputation, error rates, and block signals with alerts to pause flows before blacklist events. Log requests and responses sufficient for replay and rapid rollback without storing sensitive personal data beyond operational needs. Keep a “do not touch” registry for fragile targets and update it from weekly audits. Document dependencies so if a selector, layout, or endpoint changes, the workflow fails closed and notifies the team. Ensure all workflows are idempotent, recoverable, and can be disabled from a single control flag. Candidates should have hands-on experience with rate limiting, polite crawling, and production monitoring to protect platform reputation and uptime. *** YOU MUST INCLUDE "ROBOT" IN THE FIRST SENTENCE OF YOUR REPLY TO BE CONSIDERED. YOUR REPLY WILL BE DELETED OTHERWISE!!! ***

PHP

Реєстрація