Python Data Engineering for Brisbane Unlocked

Python Data Engineer – Web Scraping & Automated Data Pipeline About the Project I'm building Brisbane Unlocked — a local discovery platform that surfaces events, experiences, bars, markets, and hidden gems across Brisbane, Australia. The platform pulls data from 80+ sources including council open data, venue websites, ticketing platforms, and local community organisations. The backend data engine (called BB) is already built and deployed on a Digital Ocean Droplet (Ubuntu 24, Sydney region). It runs on a 6-hour automated cron schedule and currently has 8 working scrapers with 400+ events in the database. What I Need 1. Fix & improve existing scrapers • Fix 2 broken scrapers (Eventbrite, South Bank) • Apply data enrichment to 5 scrapers that return incomplete data 2. Build new scrapers — approximately 60 new sources across 7 categories • Events (~20 sources) — what's on guides, venue websites, arts institutions • Restaurants & Cafes (~8 sources) — local food guides, new openings • Bars & Nightlife (~6 sources) — bar guides, rooftop directories • Experiences (~8 sources) — activity platforms, river cruises, tours • Markets (~8 sources) — farmers markets, artisan markets, night markets • Kids & Family (~4 sources) — family guides, school holiday programs • Community & Grassroots (~5+ sources) — sports clubs, community noticeboards 3. Database migration • Migrate from SQLite to PostgreSQL via Supabase (free tier) • Create a community submissions table for user-submitted listings 4. Data quality • Deduplication, suburb normalisation, category consistency, cancelled event detection Tech Stack • Python 3.12, BeautifulSoup4, requests, lxml (already installed on the Droplet) • Playwright for JavaScript-rendered sites (to be installed as needed) • SQLite now → Supabase PostgreSQL (migration required) • Digital Ocean Droplet — Ubuntu 24.04, Sydney region • No APIs — all data collected by scraping public websites What I Provide • Full SSH access to the Droplet and all existing code • Complete source list with 80+ URLs, priorities, and technical notes • Working HTML prototype showing exactly what the platform looks like — acts as your build brief • Full developer brief documentation • Milestone-based payments — you only get paid as work is delivered and verified Pricing Please quote a fixed price for the full scope, or phase it as follows: • Phase 1: Fix + enrich existing scrapers • Phase 2: New scrapers by category • Phase 3: Database migration + data quality Do not quote per-source. To Apply — Please Answer These 4 Questions 5. Describe a similar scraping pipeline you've built — how many sources, and how did you handle data quality? 6. Have you used Playwright for JavaScript-rendered sites? Give a brief example. 7. Have you worked with Supabase or PostgreSQL? 8. A venue website doesn't use JSON-LD structured data. Walk me through how you'd extract event title, date, image and price from its HTML.

Python

Регистрация