Website & Social Media Scraper -- 2

Бюджет: 8 $

I’m assembling a clean, well-structured dataset drawn directly from public websites and major social-media platforms. The brief is focused on data scraping only—no manual copy-paste—capturing both text content and accompanying images wherever they appear together. Here’s how I picture the workflow: you set up an automated script (Python with BeautifulSoup, Scrapy, or Selenium fits nicely, but I’m open) that crawls the specified URLs and accounts, respects robots.txt, and retrieves the required fields. For text, I want the raw HTML stripped to readable copy; for images, save each file to an organised folder while logging its source URL in the same row as the text in a CSV or JSON output. Pagination, lazy-loaded media, and basic anti-bot measures are likely, so resilience matters. Deliverables • A runnable script with clear setup instructions • The final dataset (text + image links) in CSV or JSON, plus the downloaded image files organised logically • A brief read-me summarising any limits, error handling, and how to extend or schedule the scrape I’ll supply the exact target list and any login details (if public access still applies). Clean code, accurate extraction, and respectful scraping etiquette will be the acceptance criteria.

Python

Регистрация