Smartphone Repairability Index Web Scraper

Task List — Add the Repairability Index to a webscraping existing program 1. Scrape smartphone repairability index • Scrape the repairability index for all available smartphone models from the official/public source • Extract: • brand • model • repairability index • source URL ADEME official dataset (best option) structured + easier to scrape https://data.ademe.fr/datasets/indice-de-reparabilite 2. Normalize repairability dataset • Clean and standardize device names • Apply the same normalization rules already used for the scraped repair price database • Standardize: • lowercase • brand names • separators • spaces • special characters 3. Match repairability data with existing device database • Match repairability records to the existing master device list • Use: • exact match first • alias mapping second • fuzzy matching as fallback • Log unmatched models for manual review 4. Link repairability data to existing pricing database • Connect the repairability index to the existing database containing: • scraped repair prices from WeFix • scraped repair prices from Save • scraped equipment / spare parts prices from Utopya • Use the internal device_id as the common key 5. Keep repair price granularity by repair type • Do not store only one average price per device • Keep prices separated by repair type, such as: • screen • battery • connector • back glass • camera • other repair categories already available in the scraped dataset 6. Build weighted repair price aggregation • Create a weighted price logic across sources • Example: • WeFix = premium market reference • Save = standard market reference • Utopya = spare parts / cost reference • Compute: • minimum price • average price • weighted market price 7. Add timestamp and freshness tracking • Store the last update date for each scraped record • Make sure each repair price and repairability record has a timestamp • Prepare the database for future refreshes 8. Add confidence score • Create a confidence score for each device and repair type based on: • number of available sources • quality of the match • completeness of the data • Example: • low confidence if only one source exists • higher confidence if several sources match correctly 9. Define fallback logic • Define what happens when: • repairability index is missing • a device cannot be matched • some repair types are missing • Possible fallback: • null value • similar model mapping • manual review queue 10. Create final unified dataset For each device, consolidate: • device_id • brand • model • repairability_score • repair_type • repair_price_by_source • weighted_market_price • spare_part_cost • confidence_score • last_update 11. Export final output • Export the unified result as: • CSV and/or JSON • Ensure the output is ready to be used later for: • pricing engine • decision engine • analytics 12. Deliverables • repairability scraping script • normalization logic • matching logic • unmatched devices log • final unified dataset • export file Final goal Build a clean and unified database that connects: • repair prices • spare parts prices • repairability index

Python

Регистрация