CRDC Insights

Updates from the Cancer Research Data Commons:
Empowering the Scientific Community to Make New Discoveries

CRDC Components: Updates

December 17, 2025

The CRDC team, whether engaged in activities specific to the CRDC Data Commons, NCI Cloud Resources, or CRDC’s Core Services, remains focused on advancing its mission of making data and resources securely accessible to the cancer research community. The team has provided updates. 

  • Genomic Data Commons (GDC)

    EXPLORE

    The GDC issued Release 44 in October 2025 and Release 45 earlier this month, with data from new projects, additional cases from existing projects, and other various updates. Key highlights from these releases include the following. 

    • Data Releases
      • New Childhood Cancer Data Initiative (CCDI) Molecular Characterization Initiative (MCI) Project Data - This includes whole exome sequencing (WXS) and methylation data (dbGaP phs002790)
         
      • New Applied Proteogenomics Organizational Learning and Outcomes (APOLLO) Network-OV Project Data  - Whole genome sequencing (WGS) and RNA-Seq data for ovarian cancer (dbGaP phs003488)
         
      • New Refractory Cancers (RC) - Peripheral T-Cell Lymphoma (PTCL) Study Data - WGS, WXS, RNA and miRNA-Seq, and methylation arrays (dbGaP phs002097)
         
      • New Center for Cancer Genomics (CCG) Cancers of Unknown Primary Project (CUPP) Project Data - WGS, WXS, RNA and miRNA-Seq, methylation arrays, and tissue slides (dbGaP phs001801)
         
      • New Clinical Proteomic Tumor Analysis Consortium (CPTAC)-3 Cases - This includes 165 new cases of data for gastric cancer
         
      • New Human Model Cancer Initiative (HCMI) Cases - 168 new cases, including data from new human organoids  
         
      • New Cancer Genome Characterization Initiative – Burkitt’s Lymphoma Genomic Sequencing Project (CGCI-BLGSP) Slide Images - nearly 170 new tissue slide images
         
      • RNA-Seq and miRNA Aliquots - Release of previously unavailable aliquots from various studies
         
      • MuTect2 Fix - Corrected data addressing missing chromosomes 10 and 20  
         
    • Data Migration
      • Initial migration of “gender” data to “sex at birth”
      • Updates to tumor purity/ploidy values

    For full details, see the Data Release 44.0 Notes and Data Release 45.0 Notes.

    SUBMIT

    The last GDC Data Dictionary (v3.4.4) was released on July 31, 2025.

    RESOURCES

    The GDC regularly hosts informative webinars. A complete list with links is available in the Support section of the GDC website.  

  • Proteomic Data Commons (PDC)

    EXPLORE

    Since June 2025, the Proteomic Data Commons (PDC) has added more than 15,500 files and nearly 11 TB of data.  

    New Data  

    • Clinical Proteomic Tumor Analysis Consortium (CPTAC)
      • Six new intraductal papillary mucinous neoplasms (IPMN) datasets and six new lung adenocarcinoma (LUAD) datasets
      • Added multi-omics data layers: proteome, phosphoproteome, acetylome, ubiquitylome, and glycoproteome
    • International Cancer Proteogenome Consortium (ICPC) - New and updated datasets for non-functional pancreatic neuroendocrine tumors (NF-PNET), LUAD, and cervical cancer studies  
       
    • Proteogenomic Translational Research Centers (PTRCs) - New metadata for CPTAC BRCA trial data

    Updated Study Data and Metadata

    • CPTAC
    • CPTAC3 Discovery and Confirmatory:
      • Added 540 files and publication data for IPMN datasets
      • Populated CDAP metadata for GBM confirmatory studies (PDC000446–PDC000454)
      • Added the RCC Combined Study with DIA Proteome (PDC00613)
    • PTRC
      • Study: BRCA CALGB40601 Neoadjuvant Trial - Added and updated data descriptions from the publication, Proteogenomic Analysis of the CALGB 40601 (Alliance) HER2+ Breast Cancer Neoadjuvant Trial Reveals Resistance Biomarkers
        Included proteome (PDC000582) and phosphoproteome (PDC000583) data
    • ICPC
      • NF-PNET (Validation Study, Version 2):
        • Replaced corrupted archives (121 files)
        • Added processed mass spectra, peptide spectral matches, protein assembly, and quality metrics for:
          • Proteome Phase I (PDC000590)
          • Proteome Phase II (PDC000588)
          • Phosphoproteome Phase I (PDC000591)
          • Phosphoproteome Phase II (PDC000589)
          • Total: 5,009 files
    • Abbreviations:
      • CPTAC - Clinical Proteomic Tumor Analysis Consortium
      • ICPC - International Cancer Proteogenome Consortium
      • NF-PNET - Non-functional Pancreatic Neuroendocrine Tumors Study
      • PDAC BioTExt - Pancreatic Ductal Adenocarcinoma Study
      • GBM – Glioblastoma Multiforme Study
      • PTRC BRCA - Proteogenomic Translational Research Centers - Breast Cancer  
    • Release notes are available for June V4.11, June V4.12, August V4.13, September V5.0, and November V5.1.  
    • PDC 2.0 - The PDC portal has been updated to improve the user experience. This release includes a redesigned user interface, a comprehensive documentation portal, and a new common data analysis pipeline for data-independent acquisition (DIA) mass spectrometry. Read more.  

    ANALYZE

    The PDC now provides processed data reports for data-independent acquisition (DIA) studies through the Common Data Analysis Pipeline (CDAP). This includes processed outputs from the CPTAC PDAC BioTExt – Proteome study (PDC000504), as described in the publication, Frozen Tissue Coring and Layered Histological Analysis Improves Cell Type-Specific Proteogenomic Characterization of Pancreatic Adenocarcinoma.

    The reports follow DDA CDAP metadata and formatting conventions for consistency, but are tailored for MS2-based, label-free quantitation. The pipeline uses ProteoWizard, EncyclopeDIA/DIA-NN, and Skyline to generate QC reports, quantitative matrices, and Skyline documents.

    Additional DIA studies and pipeline enhancements are planned for future releases.

    Learn more in the PDC Common Data Analysis documentation.

    RESOURCES

    The CPTAC Scientific Symposium was held on September 30, 2025, and featured a presentation from Ratna Rajesh Thangudu, PhD, on Data Sharing through the NCI Proteomic Data Commons (PDC). Slides from that presentation are available here.

  • Imaging Data Commons (IDC)

    EXPLORE

    The IDC’s V22 data release adds 11 new collections totaling 5.62 TB of publicly available cancer imaging data. A full summary is available in the IDC portal release notes.  

    Among the new collections are:  

    • BoneMarrowWSI-PediatricLeukemia
    • The Cancer Imaging Archive (TCIA)
      • Contributed 10 new collections, completing IDC’s integration of all TCIA public DICOM collections.
      • Highlights include:  
        • CBIS-DDSM: Updated, standardized version of the Curated Breast Imaging Subset of Digital Database for Screening Mammography (DDSM) with 2,620 scanned mammography studies.  
        • QIN-BREAST-02: Multi-site, multi-parametric MRI dataset of adult women (18+) with invasive breast cancer undergoing neoadjuvant therapy.  

    Updates to existing collections include additional data added to the following:  

    • Molecular Characterization Initiative (MCI), part of the NCI Childhood Cancer Data Initiative (CCDI)  
    • Cancer Moonshot Biobank (CMB)
    • Clinical Proteomic Tumor Analysis Consortium (CPTAC)  
    • Applied Proteogenomics Organizational Learning and Outcomes (APOLLO)

    ANALYZE

    The IDC has improved its direct download feature, simplifying access to imaging data for local analysis. Users can now download data directly from the portal—no additional tools required—with files streamed from cloud storage to their local environment.

    A demonstration is available in the IDC User Guide.

  • Integrated Canine Data Commons (ICDC)

    EXPLORE

    The ICDC has released new datasets, as well as updates to several ongoing studies, including:  

    The ICDC released software version 4.2.0 on September 2, 2025, featuring several major enhancements:

    • Human relevance statements were added to each study narrative to demonstrate how canine cancers molecularly mirror human cancers.
    • Two new scientific programs were added to the Programs page.  These programs were established to organize affiliated studies and communicate updates, publications, and releases from each group. The two new additions are the Pre-medical Cancer Immunotherapy Network for Canine Trials and the Colorado State University Flint Animal Cancer Center. Read more.
    • A new interactive dashboard was added to the Cart page, showing file distribution and improving data visualization.

    Release notes can be found in the ICDC GitHub.

  • General Commons

    EXPLORE

    Over the past several months, the GC has added new datasets to several existing programs, including: 

    • Human Tumor Atlas Network (HTAN) - This program’s primary genomics sequencing data is now available in version 8, with accession number phs002371.
       
    • Childhood Cancer Data Initiative (CCDI) - The CCDI Pediatric In Vivo Testing Program – Leukemia now includes patient-derived xenograft (PDX) data. A dedicated PDX data model has been developed and integrated by the GC. This data has accession number phs003614.

    SUBMIT

    The GC’s updated data model, which now includes PDX data nodes and elements, is available for exploration on the GC portal. 

  • Cancer Data Aggregator

    EXPLORE

    The Cancer Data Aggregator (CDA) has been updated and redesigned to support searches across most of CRDC’s data commons, including the GDC, PDC, IDC, GC, and ICDC. The new version introduces an enhanced search tool that standardizes values across 12 clinical, demographic, and data fields, making it easier to explore data and build cohorts spanning multiple studies, cancer types, experimental methods, and data modalities.  

    Key features of the updated CDA include: 

    • A custom Python library for streamlined query formatting
    • Google Colab tutorials that require no installation
    • A FastAPI interface for developing custom tools 

    The CDA routinely retrieves data from each of the five CRDC data commons to support aggregated search capabilities. The CDA website also lists the release versions of the data currently in use. The most recent update introduces harmonized disease terms.

    RESOURCES

    The recently released “Introduction to the Cancer Data Aggregator” tutorial showcases the latest features, highlighting how to build cohorts across multiple CRDC data commons. You can find this tutorial and others on CDA’s YouTube channel

    The CDA team welcomes questions, demo requests, and feedback at CancerDataAggregator@gmail.com or through the CDA helpdesk

  • Data Commons Framework

    EXPLORE

    The Data Commons Framework (DCF) enables streamlined access to Cancer Research Data Commons (CRDC) data through consistent indexing for users. It also facilitates secure access to both open- and controlled-access datasets, ensuring that users have the appropriate permissions for each.  

    • Data Indexing
      • All Kids First data is now indexed through the DCF. It remains accessible through the Kids First Data Resource Center and the Seven Bridges Cancer Genome Cloud (SB-CGC).
      • Ongoing collaboration with the Genomic Data Commons (GDC) includes updated data formatting to enhance findability and streamline consent verification.  
    • Software Updates and Enhancements
      • The DCF team deployed Gen3 2025.09, an updated software release designed to improve data replication and the user experience. These upgrades support seamless data analysis across all CRDC data commons and the SB-CGC cloud platform, making it easier for researchers to access and analyze CRDC’s extensive datasets. DCF release notes are available here.
  • CRDC Submission Portal

    SUBMIT

    The CRDC Submission Portal introduced several user-experience improvements in late October 2025. Key updates include:

    • Excel-based submission request form:
      Users can now complete the submission request using a template in Excel format, instead of filling out an online form. This provides the option of using an API to populate the Excel file before uploading it to the CRDC Submission Portal. The system reads the uploaded Excel file and auto-populates the online form, simplifying the request process. This feature is in particularly useful for large programs that need to fill out a submission request form for multiple studies.  
       
    • Enhancements to data submission features:
      The CRDC Submission Portal now includes a new Data Explorer page that displays all released study metadata for a project and allows users to download it for their records. This gives submitters a clear view of the information they submitted to CRDC and supports more accurate updates or corrections.  
       
    • Preliminary dbGaP submission sheets: 
      Submitters to the General Commons (GC), one of the CRDC’s data commons, can now download partially completed dbGaP submission sheets populated only with the information provided to the CRDC. This helps streamline the separate dbGaP submission process for controlled-access data. Users will need to add the dbGaP-required additional information to these sheets before submitting them to dbGaP.
  • Seven Bridges Cancer Genomics Cloud

    EXPLORE

    The SB-CGC platform provides access to most CRDC datasets and regularly updates its indexing of Genomic Data Commons (GDC) and Proteomic Data Commons (PDC) content. Recent updates include indexing for GDC releases 40 and 41 and PDC versions 4.6 through 4.12.  

    ANALYZE

    Two new analytical tools have been added to SB-CGC, enhancing researchers’ ability to conduct reproducible, scalable, and accessible analyses of large-scale genomic and spatial data:

    • MOP line – A toolkit for detecting and genotyping structural variants using whole-genome sequencing data.  
    • Monocle – A Common Workflow Language (CWL)-based workflow for analyzing single-cell RNA sequencing trajectories.

    Learn more about SB-CGC’s analytical tools and resources.    

    RESOURCES

    The September 2025 SB-CGC Monthly Webinar featured presentations by Jonathon Keeney, Ph.D. (George Washington University) and Phillip Webster, Ph.D. (Velsera) on BioCompute Objects (BCOs). Recordings of this and all SB-CGC webinars are available on the SB-CGC website.