No worries, that's way more workable. Here's the full post trimmed to fit under 10,000 characters: Nationwide Property Auction Web Scraping & Intelligent Alert System (Ongoing) About Us We're a commercial real estate investment firm that acquires distressed properties nationwide. We have the capital to close on any deal in the U.S. — our bottleneck is finding opportunities before competitors. We're building an automated system that monitors every property auction source in the country, filters against our criteria, and alerts us only on qualified deals. This is not a data dump project. We don't want spreadsheets with thousands of rows. We want a smart radar system that scans everything, filters ruthlessly, and only pings us when something matches. Long-term ongoing engagement — we build incrementally and need one reliable developer who grows with us. The Problem U.S. distressed property auctions are fragmented across 3,143 counties, 11+ federal agencies, and 15+ online platforms. County tax sales live on individual county websites or SaaS providers like Realauction.com (500+ counties) and Grant Street Group. Sheriff sales are on county sheriff sites. Federal seized properties are on treasury.gov, irsauctions.gov, and others. No single database captures everything. What You'll Build Two components: a Data Ingestion Engine (scraping) and an Alert & Filter System (intelligence). PART 1: Data Ingestion (Phased) Phase 1 — Major Platform Scrapers (Month 1–2): Realauction.com (500+ county subdomains, 15+ states — #1 priority) Grant Street Group / LienHub / DeedAuction (FL, AZ, MD, CA) Bid4Assets.com (county tax sales, sheriff sales, federal forfeiture) Auction.com (Apify pre-built scraper exists) Hubzu.com, Xome.com, RealtyBid.com, GovDeals.com SRI Services / ZeusAuction (Indiana, 92 counties) CivicSource (Louisiana) Phase 2 — Federal Agency Monitoring (Month 2–3): HUD HomeStore, Fannie Mae HomePath, Freddie Mac HomeSteps IRS Auctions, U.S. Treasury/TEOAF (cwsmarketing.com) GSA (realestatesales.gov — has a REST API) FDIC asset sales, VA REO (listings.vrmco.com), USDA Phase 3 — Individual County Websites (Month 3+, Ongoing): Custom crawlers for county tax collector, sheriff, and clerk of court sites Start with top 200 counties by population, expand over time Many post PDF lists — requires OCR/PDF parsing PART 2: Alert & Filter System (Critical) Raw scraped data is worthless to us. Our team is small and cannot review thousands of listings. You must build a filtering and scoring pipeline. Our Investment Criteria: Commercial properties: 8%+ cap rate, 12%+ cash-on-cash, 70%+ occupancy, $2M+ deal size. Target types: retail, office, industrial, multifamily, medical office, government-leased, NNN, mixed-use. Residential/smaller: Estimated value must be 30%+ higher than opening bid. Flag anything under $100K as lower priority. What the Filter System Must Do: For every listing, auto-lookup estimated market value via ATTOM Data API (we provide access), county assessor data, or comparable sources. Calculate the spread between opening bid and estimated value. Score each property 1–100 based on: discount to value (highest weight), property type match, deal size, location, auction competition level, and days until auction. Categorize alerts into tiers: RED ALERT (80+): Matches all criteria, big discount, auction soon. Send immediately via email + webhook. HIGH PRIORITY (60–79): Matches most criteria. Include in daily summary. WATCHLIST (40–59): Partial match. Weekly report only. Below 40: Store in database, don't alert. Daily summary email by 7 AM ET: new listings scraped, number passing filters, ranked top 10–20 opportunities with address, auction date, opening bid, estimated value, discount %, property type, score, and direct link to listing. Must be scannable in 30 seconds. Weekly report: total volume by state, best opportunities, scraper health status, coverage stats. Deduplication: if the same property appears on multiple platforms, merge into one record noting all sources. Data Fields Per Listing Property address, parcel ID/APN, auction type, auction date/time, opening bid, assessed value, estimated market value, discount %, property type, sqft, lot size, occupancy (if available), county/state, source URL, status (upcoming/active/sold/postponed/canceled), priority score (1–100), alert tier. Output Structured JSON via webhook to our Supabase REST API (we provide schema + credentials). Alert emails as formatted HTML via SendGrid/SES. No duplicates — dedup by address + parcel ID. Technical Requirements Python (Scrapy, BeautifulSoup, Selenium) and/or JavaScript (Puppeteer, Playwright) API integration (Supabase REST API, ATTOM Data API) Anti-bot handling: CAPTCHA solving, IP rotation, proxy management Government website scraping experience (fragile, inconsistent sites) PDF parsing and OCR Email automation (formatted HTML alerts) Scheduling/orchestration (cron, n8n, Airflow, or similar) Error handling — scrapers must alert YOU when they break, not silently fail What We Provide ATTOM Data API access for property valuations Supabase database schema and API credentials Detailed platform documentation and URL structures Prioritized county/platform target lists per phase Investment criteria and scoring weights Fast, responsive communication Ideal Candidate Has scraped real estate or government auction sites (show examples) Has built alert/scoring systems on top of scraped data Builds maintainable code — sites change constantly, updates must be easy Proactive communicator — tells us immediately when something breaks Thinks like an architect, not just a scripter — this system will eventually monitor 3,000+ sources Available 20–40 hrs/week, scaling as we expand To Apply Examples of scraping projects, especially real estate/auctions/government data Your approach to building the scoring/alert layer — how do you process thousands of listings daily and surface only the top 20? Tools/frameworks you prefer and why How you handle anti-bot protections at scale Availability and rate How you'd approach scraping Realauction.com's 500+ county subdomains, structured so adding new counties is trivial Generic copy-paste proposals will be skipped. If your application doesn't address the filter/alert system, we'll pass.