Python Developer for UK Web Scraping

Customer: AI | Published: 29.11.2025
Бюджет: 30 $

Description: We are seeking an experienced Python developer to enhance and maintain a web scraping tool that collects Persons with Significant Control (PSC) data from the UK Companies House website. The project currently uses Python, requests, BeautifulSoup, openpyxl, and multithreading via concurrent.futures. Project Scope: - Scrape and parse PSC data for companies listed in Excel files. - Handle large volumes of requests while avoiding blocks and rate limiting (rotating sessions, realistic headers, retry/backoff strategies). - Populate structured Excel templates with scraped data. - Implement error logging and recovery for failed requests. - Optimize performance for multiple sheets and thousands of rows. Optional: Add enhancements to improve reliability, speed, or data accuracy. Requirements: - Strong Python experience, including requests, BeautifulSoup, and multithreading. - Experience working with Excel files via openpyxl or pandas. - Familiarity with anti-bot techniques, rate limiting, and session management. - Ability to write clean, maintainable, and well-documented code. - Good understanding of web scraping best practices and handling dynamic content. - Experience with logging, error handling, and robust automation. Nice-to-Have: - Knowledge of UK Companies House website structure. - Experience with proxy management and rotating user agents. - Familiarity with data cleaning and validation in Python. Deliverables: - Fully functional, optimized Python script that reads input Excel files and writes PSC data to the output template. - Error handling and logging for failed requests. - Documentation for setup, usage, and any customizations. - Recommendations for further enhancements if applicable. Project Duration: - Short-term contract with potential for ongoing maintenance depending on performance. How to Apply: - Please provide examples of past web scraping projects. - Briefly describe your approach to handling rate limiting and avoiding site blocks. - Indicate your availability and expected timeline for delivery. Compensation: - Competitive, based on experience and project scope.