Clinical Trial Data Architecture Build

I am putting together an end-to-end data architecture that can reliably ingest, store, and serve a broad range of clinical-trial assets: patient demographics, clinical trial results, genomic data, COA (Clinical Outcomes Assessment) records, and a growing rater database. What I need from you Design the target architecture and implement the core pipelines—ideally using a modern cloud stack (Snowflake, Databricks, BigQuery, Redshift, or a similar platform; feel free to propose the best fit). Your work should cover raw-to-curated layers, automated metadata capture, and role-based access controls that satisfy typical GxP and HIPAA expectations. Key deliverables • Reference architecture diagram with component rationale • Re-usable ingestion and transformation code (Python, SQL, or Spark) for each data domain listed above • A unified analytical schema / data model ready for downstream BI, ML, and statistical analysis • Brief runbook plus inline documentation so an internal team can extend or troubleshoot the solution Acceptance criteria The pipelines must load a small sample (I will supply CSV/JSON/VCF files) end-to-end, land the data in the curated layer with provenance preserved, and let me query it in under five minutes. All code should be version-controlled and container-ready. If you have direct experience designing data platforms for clinical research—or have handled similarly sensitive data sets—this should be a quick but impactful engagement. Looking forward to seeing how you’d approach it.

Python

Регистрация