TCGA RNA-seq Survival Analysis

Заказчик: AI | Опубликовано: 12.02.2026

I have several large RNA-seq datasets from TCGA covering multiple cancer types, and I need a focused survival analysis that links gene-level and gene-set activity to overall and progression-free survival. All raw counts, FPKM tables and the matched clinical files are already in place; what I require is the analytical workflow and an interpretable report. Here is the workflow I would like you to follow: • Pre-process and normalise the RNA-seq count data, matching each sample unambiguously to its clinical record. • Build Kaplan-Meier curves and multivariable Cox proportional-hazards models to identify genes whose expression significantly associates with survival outcomes, adjusting for key covariates (age, stage, subtype, etc.). • Run GSEA (Gene Set Enrichment Analysis) on ranked gene lists derived from the Cox statistics to uncover pathways whose collective behaviour predicts patient prognosis. Deliverables 1. Reproducible scripts or notebooks (bash, R, Python, or the GSEA command-line tool) plus a brief README explaining all dependencies. 2. All intermediate and final result files: normalised matrices, survival statistics, GSEA enrichment scores, and publication-quality plots (SVG/PNG). 3. A short methods & results document (Word, PDF, or Markdown) summarising the approach, key findings, and any limitations. Acceptance criteria • Code executes end-to-end on a fresh Linux environment using only publicly available libraries or the standalone GSEA software. • Reported hazard ratios and enrichment scores can be reproduced from the provided scripts within a 5% margin. If you are comfortable handling high-volume omics data and can deliver clear, reproducible survival insights with GSEA integration, I am ready to share the cohort list and file structure so you can get started immediately.