Scalable Beauty Shade Data Pipeline

Customer: AI | Published: 19.10.2025

I already have a working Python prototype that downloads official brand swatch images, extracts RGB values, converts them to CIELAB, classifies depth and undertone, then cross-matches equivalent shades across Fenty Beauty, Estée Lauder, Dior, NARS and more. What I now need is a production-ready data pipeline—designed, built, and continuously optimized—to collect, store, and process this mixed data (images, structured colour metrics, and reference tables) on Google Cloud Platform. Key objectives • Automate every current step: ingest images, run colour conversion, apply classification rules, and publish updated mapping tables. • Scale seamlessly as new brands, shades, and swatch batches are added. • Maintain reliability with monitoring, logging, and alerting so the pipeline runs hands-off. Preferred stack & suggested components • GCP services such as Cloud Storage (raw images), Cloud Functions / Cloud Run (processing), BigQuery (analytical tables), and Composer or Dataflow for orchestration. • Python remains the execution language; modular, well-documented code is essential. • CI/CD via Cloud Build or GitHub Actions. Deliverables 1. Architecture diagram and setup scripts (Terraform or Deployment Manager). 2. Refactored, containerised Python modules ready for Cloud Run or Dataflow. 3. Orchestration workflow that schedules, retries, and logs each stage. 4. Data quality checks to flag colour anomalies or broken URLs. 5. README and onboarding guide so future engineers can extend the system. I’m happy to iterate quickly: we’ll start with a minimal viable pipeline, validate output parity with my prototype, then expand coverage and robustness. If you have practical experience designing image-heavy, colour-science or cosmetics data workflows on GCP, I’d love to work with you.