Historical Power Grid Data Extraction: South Africa

Заказчик: AI | Опубликовано: 15.03.2026
Бюджет: 250 $

Project Overview: I am looking for an experienced data scraper/engineer to extract 10 years of DAILY historical power grid data for South Africa (1 January 2014 - 31 December 2024). Specifically, I need the Unplanned Capability Loss Factor (UCLF) and Planned Capability Loss Factor (PCLF) data from Eskom (the national utility). CRITICAL NOTE: Do not use the EskomSePush (ESP) API. ESP only provides consumer-facing loadshedding schedules. I need the actual macro-generation MW (Megawatt) breakdown of power plant failures. Data Sources to use: The official Eskom Data Portal (https://www.eskom.co.za/dataportal/ - under "Outages" and "Supply Side"). Alternative public archives: Because the official portal sometimes restricts downloads to a 5-year rolling window, you may need to pull from open-source community SQLite databases (e.g., unofficialeskom.com or GitHub repositories tracking Eskom data) or CSIR energy publications to get the full 2014-2024 timeline. Specific Data Points Required (Hourly extraction, aggregated to Daily): Unplanned Outages (MW) Planned Outages (MW) Total Installed Capacity (MW) / RSA Contracted Demand Calculations & Output Formatting: The raw data is usually reported in hourly Megawatts (MW). I need you to calculate the percentages and provide a clean, daily CSV file with the following columns: Date (Format: YYYY-MM-DD) Daily_Avg_UCLF_Percentage: (Average Hourly Unplanned MW / Total Installed MW) * 100 Daily_Max_UCLF_Percentage: The highest UCLF percentage recorded that day. UCLF_at_1700_SAST: The specific UCLF percentage at exactly 17:00 South African Standard Time (Market Close). Daily_Avg_PCLF_Percentage: (Average Hourly Planned MW / Total Installed MW) * 100 Deliverables: A single, clean CSV or Excel file containing the 2014-2024 DAILY data. A brief text file or README explaining exactly where the data was sourced from and how missing values (if any) were handled. The Python/scraper script used to generate the data (for my own reproducibility records)