E-Commerce Website Text Scraping

I need a robust yet straightforward solution that pulls only the text content from several e-commerce websites. The target fields are product names, long and short descriptions, category labels, and any text-based specifications; I do not need images or pricing data. The scraper should: • Handle pagination, dynamic or lazy-loaded sections, and common anti-bot measures without overloading the servers. • Output clean, well-structured data (CSV or JSON preferred) ready for import into my internal system. • Be written in readable, well-commented code—Python with Scrapy, BeautifulSoup, or Selenium is ideal, but I’m open to equivalent approaches if they achieve the same reliability. • Include simple configuration so I can add or swap target domains later. • Respect robots.txt directives and configurable request delays. Acceptance criteria 1. Running the script against a supplied test URL set returns all visible text fields with 0% missing data. 2. The output passes a quick integrity check for duplicate rows and encoding errors. 3. A brief README explains setup, dependencies, and how to extend the crawler to new sites. Deliver the source code, requirements file, sample output, and the README.

Реєстрація