County-Level China Agriculture Panel

Customer: AI | Published: 18.11.2025
Бюджет: 30 $

I have city-level statistical yearbooks from every prefecture in China, each of which embeds tables that already break figures down to the county tier. The task is to pull those county rows out of every annual volume, harmonise the variable names, and stitch everything together into a long-form panel. Scope of data • Years: every edition currently publicly released for each city (typically 2000-present, but some municipalities go back further). • Geography: all counties, county-level cities and banners that appear in the yearbooks. • Indicators: agricultural production only, with an emphasis on crop production volumes/areas and any line items that record investment in agricultural production materials (fertiliser, machinery, irrigation spend, etc.). What makes the job challenging is the variation in layout, wording and, in some books, scanned PDF tables that require reliable OCR. Consistency across provinces and time is critical, so manual checks or rule-based validation will almost certainly be needed after the initial scrape or data entry. Deliverables 1. A single Excel workbook. – One sheet: tidy panel (columns: year, province, city, county, indicator, value, unit, source page). – Additional sheets (if helpful): codebook and any concordance files you build for county name changes or mergers. 2. Scripts or documented procedures used to extract and clean the data (Python, R, or another reproducible workflow). 3. A brief data-quality note that flags missing values and explains how ambiguous county names or split counties were handled. Acceptance criteria • Every county that appears in a yearbook is present for each year extracted, or marked clearly as missing with a reason. • Numeric fields validate against totals reported at the city level (±1 %). • Workbook opens error-free in Excel 2016+ and passes a quick pivot-table check. If you have prior experience mining Chinese statistical yearbooks, handling OCR for mixed Chinese/English tables, or building historical county concordances, that background will be a strong fit for this assignment.