Repository

Genomic Data Commons

(GDC)
Enabling precision medicine with high quality, harmonized genomic data
GDC

Overview

The Genomic Data Commons (GDC) is a cancer knowledge network that supports hosting, standardization, and analysis of genomic, clinical, and biospecimen data from cancer research programs. The GDC harmonizes raw sequencing data, identifies and applies state-of-the-art bioinformatics methods for generating mutation calls, structural variants and other high-level data, and provides scalable downloads and web-based analysis tools. 

Data in the GDC are structured using the GDC Data Model, with properties and data types defined in the GDC Data Dictionary. The data model is continually updated to accommodate evolving genomic technology and biomedical research, with guidance and input from the research community and external reference standards.

Because of the personal nature of genomic data, some genomic data in the GDC may be controlled access, requiring eRA Commons authentication and dbGaP authorization to access the data. Whether a dataset is open or controlled is determined according to Data Access Policies in a process driven by informed consent of research participants. 

Data in the GDC are accessible through many ways, including: the GDC Data Portal, a web-based platform with a graphical user interface to search for and download data; the GDC Data Transfer Tool (DTT), a client-based utility to efficiently download and upload large volumes of data; and the GDC Application Programming Interface (API), a programmatic interface to query, download, upload, and analyze data.

In addition to providing access to data, the GDC provides several analysis tools via the Data Portal Exploration and Analysis features:

  • Mutation Frequency Distribution Graph - View the most frequently mutated genes for any cohort and plot frequencies of cases with mutations and copy number variants for a selected gene
  • OncoGrid - Visualize combinations of gene mutations and copy number variants for a project or custom cohort
  • Survival Analysis - Compare overall survival of any two cohorts, such as patients with and without a mutated gene of interest
  • Set Operations - Perform operations on gene, mutation, or case sets by visualizing set similarities and differences in a Venn diagram
  • Cohort Comparison - Display the survival analysis of custom case sets and compare characteristics such as gender, vital status and age at diagnosis
  • Clinical Data Analysis - Select a clinical variable and view cohort-level survival plots, histograms, box plots, and Q-Q plots
  • Protein Viewer - Visualize gene mutations mapped to their protein functional domains 

Data Types

The GDC provides data that are processed through a uniform set of bioinformatics pipelines. GDC generated data types and associated file formats for each experimental strategy include:

Experimental Strategy Data Type File Format
Clinical and Biospecimen Clinical and Biospecimen Metadata JSON and Tab-delimited
Diagnostic and Tissue Slide Slide Image SVS
Genotyping Array Copy Number Segment TXT
Methylation Array Methylation Beta Value TXT
miRNA-Seq miRNA and Isoform Expression Quantification TXT
RNA-Seq Gene Expression and Splice Junction Quantification TXT and Tab-delimited
Targeted Sequencing Transcript Fusion Tab-delimited
WGS Structural Rearrangements BED
WGS Raw Somatic Mutations VCF
WGS MSISensor (Tumor-Only) TXT
WGS, Targeted Sequencing, Genotyping Array Copy Number Scores TXT
WXS, Targeted Sequencing Raw and Annotated Somatic Variants VCF
WXS, Targeted Sequencing Aggregated and Masked Somatic Mutations MAF
WXS, WGS, RNA-Seq, miRNA-Seq, ATAC-Seq Aligned Reads BAM

Datasets

The GDC provides access to datasets from key NCI programs such as:

The GDC also collaborates with organizations external to NCI to provide harmonized data from critical cancer programs such as:

The GDC has ongoing data releases to make additional data sets available to the cancer research community.

Anatomical Sites

The GDC includes data from multiple organ sites. Major sites include:

  • Adrenal Gland
  • Bile Duct
  • Bladder
  • Blood
  • Bone
  • Bone Marrow
  • Brain
  • Breast
  • Cervix
  • Colorectal
  • Esophagus
  • Eye
  • Head and Neck
  • Kidney
  • Liver
  • Lung
  • Lymph Nodes
  • Nervous System
  • Ovary
  • Pancreas
  • Pleura
  • Prostate
  • Skin
  • Soft Tissue
  • Stomach
  • Testis
  • Thymus
  • Thyroid
  • Uterus
Image
Two Humans With Anatomical Features
Explore the Genomic Data Commons