Comprehensive Cricinfo Test Data Scraper

Заказчик: AI | Опубликовано: 11.12.2025

I want a repeatable way to pull full-match data from ESPN Cricinfo for every Test played to date and for any new Tests going forward. The end goal is a clean, well-structured dataset (CSV or JSON) that holds one row per player per innings, enriched with the exact figures I specify below, and nothing more. Batting numbers I must see: Runs scored, Balls faced, and Strike rate. Bowling details required: Wickets taken, Overs bowled, and Economy rate. Fielding figures needed: Catches taken and Stumpings. The script should accept either a list of match IDs or a date range, crawl the corresponding scorecards, parse the data, and append or refresh the dataset without creating duplicates. I’m comfortable running Python, so please feel free to lean on requests/BeautifulSoup or Selenium if necessary, but structure the code cleanly enough that I can schedule it via cron on a Linux server later. Deliverables • Python source code with clear setup notes (virtual-env, requirements.txt). • A sample output file covering at least five recent Tests so I can validate field mapping. • Brief README explaining how to rerun the scraper for fresh matches. The job is complete when I can run one command, point it at new match IDs, and see rows appear in my master file with the statistics listed above pulled accurately from Cricinfo. If anything is unclear, let me know early so we can keep the scope tight and deliverable.