Multi-Page Text Data Scraping

Заказчик: AI | Опубликовано: 27.10.2025
Бюджет: 250 $

I need all the text data spread across several hundred pages of a single database-style website gathered, cleaned, and delivered to me in one well-structured Excel workbook. The site uses a consistent layout—each page lists record-type entries with roughly a dozen text fields that include titles, short descriptions, and a few categorical tags. Here’s what will make this project a success for me: • A repeatable script (Python, BeautifulSoup/Scrapy or a similar tool) that automatically navigates pagination, respects robots.txt rules, and handles any lazy-loaded content. • An Excel file where each row corresponds to one record and each column maps cleanly to the on-page text fields, with no HTML artifacts or extra whitespace. • Basic documentation so I can rerun the scraper if the site content updates. There are no images, and no need for JSON or CSV—Excel is the only format I need. If rate limits or CAPTCHAs appear, please build in polite delays or lightweight work-arounds to stay within the site’s terms of service. Let me know how quickly you can turn around an initial sample file; once I confirm the structure is correct, you can scrape the full set.