Data Release Updates

This aggregated page of listings is updated regularly. These listings derive from individual data commons or cloud resource websites and portals.

Learn more about accessing all data on our How to Access Data page.  


Genomic Data Commons (GDC)

In GDC Data Release 38.0 on August 31, 2023, the GDC added support for 9000+ high coverage TCGA whole genome WGS alignments. Details are as follows:

  • New Projects

    • MP2PRT-ALL - Molecular Profiling to Predict Response to Treatment for Acute Lymphoblastic Leukemia - phs002005
      • 1,507 cases
      • WGS
    • CGCI-HTMCP-DLBCL - HIV+ Tumor Molecular Characterization Project - Diffuse Large B-Cell Lymphoma - phs000235
      • 70 cases
      • WGS, RNA-Seq, miRNA-Seq, Tissue Slide Images
    • MATCH-B - Genomic Characterization CS-MATCH-0007 Arm B - phs002028
      • 33 cases
      • WXS, RNA-Seq
    • MATCH-N - Genomic Characterization CS-MATCH-0007 Arm N - phs002151
      • 21 cases
      • WXS, RNA-Seq
  • New Cases from Existing Projects

    • CPTAC-3 - GBM and Kidney cohorts - 50 cases
    • HCMI-CMDC - 31 cases
    • CGCI-BLGSP - 204 cases
    • TCGA-TGCT - 113 cases
  • New Data Sets

    • 9,368 WGS alignments from the TCGA program
      • 4,676 Cases
      • 9,368 Aliquots
    • All methylation files that were produced with the SeSAMe pipeline was replaced with a new version.
    • TCGA SNP6 data processed with the ASCAT3 and ABSOLUTE pipelines
    • 172 CEL and birdseed files from TCGA SNP6
    • Release of remaining data for CGCI projects CGCI-BGLSP and CGCI-HTMCP-CC
  • New Metadata

    • The wgs_coverage field is now populated for most BAMs and will allow for WGS BAMs to be queried by coverage range category.
    • The QC metrics for applicable BAMs are now queryable through the GDC Data Portal and API.
    • The msi_status and msi_score fields, which were produced using MSISensor2, are now queryable through the GDC Data Portal and API

Find the complete list of datasets available on the GDC portal.
 


Proteomic Data Commons (PDC)

New datasets as of April 2023 include: 

Proteome

  • Description:  CPTAC Pan-Cancer Proteome data processed by the University of Michigan team's pipeline, and then post-processed by the Baylor College of Medicine's pipeline. Details can be found in the STAR Methods of 'Proteogenomic Data and Resources for Pan-Cancer Analysis' (i.e., 'BCM pipeline for pan-cancer multi-omics data harmonization').

    Cohorts:  BRCA, ccRCC, COAD, GBM, HGSC, HNSCC, LSCC, LUAD, PDAC, UCEC
     
  • Description:  CPTAC Pan-Cancer Proteome data processed by the Broad institute based on Spectrum Mill pipeline. Details can be found in the STAR Methods of 'Pan-Cancer analysis of post-translational modifications reveals shared patterns of protein regulation' (i.e., 'Proteomics data processing ').

    Cohorts:  BRCA, ccRCC, COAD, GBM, HGSC, HNSCC, LSCC, LUAD, PDAC, UCEC, MB
     
  • Description:  CPTAC Pan-Cancer Proteome data processed by the University of Michigan team's pipeline. Details can be found in the STAR Methods of 'Proteogenomic Data and Resources for Pan-Cancer Analysis' (i.e., 'Global proteomics and phosphoproteomics (pipeline from the University of Michigan)').

    Cohorts:  BRCA, ccRCC, COAD, GBM, HGSC, HNSCC, LSCC, LUAD, PDAC, UCEC
     
  • Description:  CPTAC Pan-Cancer Proteome data processed by the University of Michigan team's pipeline, and then pre-processed by the Mount Sinai team's pipeline. Details can be found in the STAR Methods of 'Proteogenomic Data and Resources for Pan-Cancer Analysis' (i.e., 'Abundance table preprocessing)').

    Cohorts:  BRCA, ccRCC, COAD, GBM, HGSC, HNSCC, LSCC, LUAD, PDAC, UCEC

Phosphoproteome

  • Description:  CPTAC Pan-Cancer Phosphoproteome data processed by the University of Michigan team's pipeline. Details can be found in the STAR Methods of 'Proteogenomic Data and Resources for Pan-Cancer Analysis' (i.e., 'Global proteomics and phosphoproteomics (pipeline from the University of Michigan)').

    Cohorts:  BRCA, ccRCC, COAD, GBM, HGSC, HNSCC, LSCC, LUAD, PDAC, UCEC
     
  • Description:  CPTAC Pan-Cancer Phosphoproteome data processed by the University of Michigan team's pipeline, and then post-processed by the Baylor College of Medicine's pipeline. Details can be found in the STAR Methods of 'Proteogenomic Data and Resources for Pan-Cancer Analysis' (i.e., 'BCM pipeline for pan-cancer multi-omics data harmonization').

    Cohorts:  BRCA, ccRCC, COAD, GBM, HGSC, HNSCC, LSCC, LUAD, PDAC, UCEC
     
  • Description:  CPTAC Pan-Cancer Phosphoproteome data processed by the University of Michigan team's pipeline, and then pre-processed by the Mount Sinai team's pipeline. Details can be found in the STAR Methods of 'Proteogenomic Data and Resources for Pan-Cancer Analysis' (i.e., 'Abundance table preprocessing)').

    Cohorts:  BRCA, ccRCC, COAD, GBM, HGSC, HNSCC, LSCC, LUAD, PDAC, UCEC
     
  • Description:  CPTAC Pan-Cancer Proteome data processed by the Broad institute based on Spectrum Mill pipeline. Details can be found in the STAR Methods of 'Pan-Cancer analysis of post-translational modifications reveals shared patterns of protein regulation' (i.e., 'Proteomics data processing ').

    Cohorts:  BRCA, ccRCC, COAD, GBM, HGSC, HNSCC, LSCC, LUAD, PDAC, UCEC, MB

Glycoproteome

  • Description:  CPTAC Pan-Cancer Glycoproteome data processed by the John Hopkins University team's pipeline. Details can be found in the STAR Methods of 'Proteogenomic Data and Resources for Pan-Cancer Analysis' (i.e., 'Glycoproteomics (pipeline from Johns Hopkins University)').

    Cohorts:  BRCA, ccRCC, COAD, GBM, HGSC, HNSCC, LSCC, LUAD, PDAC, UCEC

Acetylome

  • Description:  CPTAC Pan-Cancer Acetylome data processed by Common Data Analysis Pipelines (CDAP). Details can be found at https://proteomic.datacommons.cancer.gov/pdc/

    Cohorts:  BRCA, GBM, LSCC, LUAD, UCEC
     
  • Description:  CPTAC Pan-Cancer Acetylome data processed by the University of Michigan team's pipeline. Details can be found in the STAR Methods of 'Proteogenomic Data and Resources for Pan-Cancer Analysis' (i.e., 'Global proteomics and phosphoproteomics (pipeline from the University of Michigan)')

    Cohorts:  BRCA, GBM, LSCC, LUAD, UCEC
     
  • Description:  CPTAC Pan-Cancer Proteome data processed by the Broad institute based on Spectrum Mill pipeline. Details can be found in the STAR Methods of 'Pan-Cancer analysis of post-translational modifications reveals shared patterns of protein regulation' (i.e., 'Proteomics data processing ').

    Cohorts:  BRCA, GBM, LSCC, LUAD, UCEC, MB

Ubiquitylome

  • Description:  CPTAC Pan-Cancer Ubiquitylome data processed by Common Data Analysis Pipelines (CDAP). Details can be found at https://proteomic.datacommons.cancer.gov/pdc/

    Cohorts:  LSCC

Find the complete list of datasets available on the PDC portal


Imaging Data Commons (IDC)

The IDC September 2023 release includes the following new features and resources:

  • You can now access digital whole-slide hematoxylin and eosin (H&E) stained images for Rhabdomyosarcoma (RMS) patients via the IDC portal at this link.
  • A new Zenodo community for IDC is available. Zenodo maintains certain types of data and provisions digital object identifiers (DOIs) for those data descriptors.
  • The IDC portal has new UI elements for simplifying data downloads. 

Read more about the above items in the IDC forum.

Find the complete list of datasets available on the IDC portal.


Integrated Canine Data Commons (ICDC)

New study released May 24, 2023:

OSA03 - “Comparative analysis using whole genome bisulfite sequencing of human and canine osteosarcoma”

  • Adds 44 cases 
  • Adds 88 files (~73 GB)  

Find the complete list of datasets available on the ICDC portal.


Cancer Data Service (CDS)

On May 18, 2023, Data Release 3.0 introduced the following new studies:

LCCC 1108: Development of a Tumor Molecular Analyses Program and Its Use to Support Treatment Decisions (UNCseqTM) -  phs001713.v1.p1

  • 2,074 Participants
  • 11,340 Files

DCCPS CIDR: The Genetic Basis of Aggressive Prostate Cancer: The Role of Rare Variation - phs001554.v1.p1

  • 13,891 Participants
  • 27,782 Files

DCCPS:Discovery of Colorectal Cancer Susceptibility Genes in High-Risk Families – phs001787.v1.p1

  • 51 participants
  • 318 files

PDXNet: Washington University PDX Development and Trial Center - phs002305.v1.p1

  • 110 Participants
  • 990 Files

CCDI: Molecular Characterization Initiative – phs002731.v2.p1 

  • 108Participants
  • 1,987 Files

Explore CDS data.