CRDC Insights

Updates from the Cancer Research Data Commons:
Empowering the Scientific Community to Make New Discoveries

CRDC Components: Updates

June 23, 2025

The CRDC team, whether engaged in activities specific to the CRDC Data Commons, NCI Cloud Resources, or CRDC’s Core Services, remains focused on advancing its mission of making data and resources securely accessible to the cancer research community. The team has provided updates. 

  • Genomic Data Commons (GDC)

    Data Release

    The GDC’s most recent data release (#43) came out in May 2025. Full release notes are available here. Highlights from the May release include:  

    • New Whole Genome Sequence (WGS) variant calls, featuring VarScan2 software
    • 350+ additional cases from the Human Cancer Models Initiative (HCMI), with WGS, Whole Exome Sequence (WXS), and RNA-Sequence data
    • Updated clinical data from The Cancer Genome Atlas (TCGA) and the Clinical Proteomic Tumor Analysis Consortium (CPTAC)

    Resources to Support AI-informed Research
    A new resources page focused on AI-informed research using GDC data has been created. It details various AI applications and objectives that showcase how GDC data is used by the research community and provides information on GDC resources supporting AI applications. Learn more.  

    Release of New Single-Cell RNA-Seq Analysis Tool 
    The newly developed GDC single-cell RNA-Sequence (scRNA-Seq) Tool streamlines the exploration of single-cell RNA sequencing data through cluster plots and gene expression overlays. Key features include the visualization of dimensionality reduction plots such as UMAP, t-SNE, and PCA, differential gene expression analysis, gene set enrichment analysis, and a violin plot summarizing the data. An overview of the scRNA-Seq Tool is available in the GDC Data Portal User’s Guide. There is also a corresponding scRNA-Seq gene expression Application Programming Interface (API) endpoint with instructions available in the GDC API documentation. 

    New Copy Number Variation (CNV) Categories
    The GDC now offers enhanced CNV categories with more specific values. Gains are now classified as "gain" or "amplification," while losses are categorized as "heterozygous deletion" and "homozygous deletion." See the GDC Data Portal Release Notes for details.

  • Proteomic Data Commons (PDC)

    This year the Proteomic Data Commons (PDC) has added nearly 9 TBs of data from several ongoing programs, including:

    • Clinical Proteomic Tumor Analysis Consortium (CPTAC)
    • International Cancer Proteogenome Consortium (ICPC)
    • Applied Proteogenomics OrganizationaL Learning and Outcomes (APOLLO)

    For more information about the projects contributing the data, please visit the PDC Release Notes page.   

    Clinical data from CPTAC studies, which include attributes related to patient outcomes, have also been updated for several cohorts, including: 
    •    Clear Cell Renal Cell Carcinoma
    •    Glioblastoma
    •    Head and Neck Squamous Cell Carcinoma
    •    Lung Adenocarcinoma
    •    Lung Squamous Cell Carcinoma
    •    Pancreatic Ductal Adenocarcinoma
    •    Uterine Corpus Endometrial Carcinoma

    The release notes page also provides detailed infromation on downloading publicly accessible data, including instructions for filtering by demographics and diagnoses.

  • Imaging Data Commons (IDC)

    The Imaging Data Commons (IDC) recently released V21. Detailed data release notes can be found in the IDC User Guide

    This release includes many additions:
    •    Files: +174,244 (45,784,454 total)
    •    Series: +3,308 (950,888 total)
    •    Studies: +2,070 (149,577 total)
    •    Cases: +1,893 (71,082 total)
    •    Collections: +1 (150 total)  
    •    Disk size: +1.94 TB (87.49 TB total)  

    The main highlight of this release is the update of the Childhood Cancer Data Initiative Molecular Characterization (CCDI-MCI) collection (see more details on the collection page), which has doubled the number of slides and patients to include 3,715 digital pathology slides and 3,582 patients.

    Users can get started working with the pathology slides available through the IDC by using this notebook. The team has also created a guide to provide general information on accessing patient-level clinical data with images for the CCDI-MCI Collection.

    Earlier this year, the IDC team added a new feature to its “cart.” Users can now create a collection of data files for individual patients, studies, or series, and then download the selected files directly to their computing environment.  

    The IDC also refined the appearance of the Explore section of its portal, making it more compact and intuitive. 

    Also note that the IDC has revised the way it supports researchers. It has replaced standing virtual office hours with appointments made online. A one-on-one support session with an IDC team member can be requested by filling out a short form (which is also linked from the landing page of the IDC Portal and can be found in the IDC documentation).

  • Integrated Canine Data Commons (ICDC)

    The Integrated Canine Data Commons (ICDC) recently released several new studies, including: 

    Additionally, the ICDC has added informational videos to each of the respective Program Details pages.

  • Clinical and Translational
Data Commons

    The Clinical and Translational Data Commons (CTDC) has been actively developing several new features to improve the user experience. These will be available after the upcoming software release, expected by the end of June, and will include the following: 

    • Single-click export of molecular files to the Velsera Seven Bridges Cancer Genomics Cloud (SB-CGC). With this feature, users can select files of interest and, with one click, transfer that data to SB-CGC for analysis in a secure cloud-based environment.
    • Expanded Histogram View, which graphically illustrates the data and files selected in the Explore tab. This allows users to easily see the breakdown of a cohort they are building by characteristic properties such as diagnosis, stage of disease, race and ethnicity, sex, and targeted therapy.   
    • A new Interoperability Microservice that will help users find related data from other CRDC resources directly on a CTDC study page. This makes it easier to explore supporting information, such as matching datasets, that may exist in other data commons. This is a step toward a more connected experience across NCI’s data platforms.
    • The Data Availability Landscape View, a new feature that displays a high-level summary of available data types across CTDC studies, is located on the CTDC Studies page.     
    • The Data Model Navigator (DMN), a new tool that lets users navigate the nodes, properties, controlled vocabularies, common data elements, and relationships of the CTDC data model. This tool is available within the CTDC to facilitate data submission.
  • General Commons

    The Cancer Data Service Has Been Renamed the General Commons

    The Cancer Data Service (CDS), one of CRDC’s Data Commons, has a new name – the General Commons (GC). This better reflects its role within the CRDC of providing data management, storage, and sharing capabilities for NCI-funded studies that fall under the following categories:

    • Studies with data that do not match existing CRDC Data Commons
    • Studies with data that do not fit current data type criteria and/or the required metadata standards for existing CRDC Data Commons  

    Datasets

    Over the last several months, the GC has added distinct datasets as well as new data to existing program data, including: 

    • Human Tumor Atlas Network (HTAN)
    • Childhood Cancer Data Initiative (CCDI): Genomic Sequencing of Pediatric Rhabdomyosarcoma
    • Childhood Cancer Data Initiative (CCDI): Single-Cell Atlas of NF1 Nerve Sheath Tumors
    • CPTAC Proteogenomic Study
    • The Molecular Profiling to Predict Response to Treatment (MP2PRT – CESC)

    Features and Resources
    Earlier this year, the GC added new features and support resources to enhance the user experience, including: 

    • Enhanced data discovery capabilities with improved search functionality across key metadata fields, including diagnosis, accession numbers, study identifiers, and file types
    • Updated terminology to increase consistency
    • Updated GC data model to share patient-derived xenograft (PDX) data
    • Refined library source classification by separating Common Data Element (CDE) into distinct fields for molecule type and material source, improving data accuracy and traceability

    Most recently, the GC added even more features and functionality, including:

    • Enhanced performance, scalability, and support for advanced graph queries enabled by the migration of metadata to the Memgraph Database
    • Improved user support with a direct link to the CRDC Helpdesk
    • Enhanced search filters within the Data page, for Participants, Samples, and Files
    • The addition of a GC User Guide and GC API Query documentation in the About section
    • Improved user experience with streamlined Programs pages
    • Updated data releases grouped by years and software releases tracked on the GC Releases page under the About tab. Even more granular information about bug fixes can be found on the GC portal’s GitHub page.
  • Cancer Data Aggregator

    The Cancer Data Aggregator (CDA) completed its most recent data and metadata update in late May. Full notes can be found here. 

    CDA regularly refreshes its central search database to align with any updates that the CRDC Data Commons such as GDC, PDC, IDC, GC, and ICDC are making to their data and metadata. When accessing CDA, users can search by harmonized, common language terms and build comprehensive cohorts from subjects that may have data in various data commons. Users can then take the aggregated cohorts to the Cloud Resources for further analysis.

  • Data Commons Framework

    The Data Commons Framework (DCF) enables access to more than 14 PB of CRDC data through consistent indexing. It also ensures access to both open and controlled-access data with appropriate permissions. Recent activities from the DCF team include:

    Data Indexing

    • Made an additional 750 terabytes of data (250,000 files from 28 research studies) easier to find and use, utilizing the IndexD system.
    • Supported the Genomic Data Commons (GDC) so their data can be accessed more quickly by researchers. Specifically, the DCF updated GDC data replication strategies to improve access across multiple platforms, including portals and NCI Cloud Resources.

    Security & Compliance

    • Completed its annual security audit and received a 3-year renewal of its FedRAMP authorization, which allows the CRDC to continue to deliver controlled-access data to the research community in compliance with the NIH Genomic Data Sharing Policy.
  • CRDC Submission Portal

    Since the CRDC Submission Portal was launched in June 2024, the Data Hub team has successfully deployed quarterly releases with new front and back-end improvements to facilitate data submission to CRDC for NCI and NIH-funded studies. Some of the most recent features that were added to enhance the submitter’s experience include:  

    • An ability for multiple users to collaborate on a single data submission
    • A progress bar for submitters to better visualize and understand the status of the data upload
    • Validation errors providing an aggregated view so submitters can see the classes of errors that are causing the most issues
    • Permissible value suggestion feature if the entered value is found in the NCI Thesaurus (NCIt)
    • An option to restore a previously cancelled Submission Request

    Detailed Release Notes can be found on the CRDC Submission Portal.  

  • Broad Institute FireCloud

    Updated Features 

    • The FireCloud Data Library is now powered by Data Use Oversight System (DUOS) that streamlines access to data. NCI datasets such as TARGET and TCGA can be searched and accessed through this new system, which is compatible with FireCloud/Terra’s analytical tools and applications.
    • FireCloud/Terra has released its publicly available roadmap, providing a transparent view of the Terra platform's ongoing development. It gives insight into features and functionality across four active development stages: Near Term, Preview, Launching, and Released.  

    Billing and Cost Management 

    • Several cost management and visibility improvements have been made, allowing users to set lifecycle rules to delete files in a specified location after a defined period. However, files in workspace buckets can still be restored if needed. Additionally:
      • Users can now view a combined spend report for all the workspaces they own across all their billing projects within Terra.
      • For new workflows initiated after March 2025, the billing dashboard’s Run Cost column displays the estimated run cost when the actual cost is unavailable.
      • Users can also set workflow cost thresholds to manage their budgets. 

    Note that the estimated run cost and threshold features do not apply retroactively to older workflows. 

  • Seven Bridges Cancer Genomics Cloud

    New Tools and Apps

    Workshops & Educational Engagement

    • Georgetown University (February and March 2025): The SB-CGC team collaborated with Yuriy Gusev, PhD, to showcase MCMICRO as a tool for analyzing slide images and the COPD Machine Learning tutorial for examining medical imaging data. Dr. Gusev presented his work in a webinar held on May 28, 2025. A recording of the webinar is available.  
    • Purdue Workshop (week of May 21): The SB-CGC team presented a workshop on creating bioinformatics workflows using SB-CGC tools and associated CRDC resources, particularly data from the Integrated Canine Data Commons (ICDC). Slides are available.
    • University of Florida Cancer Systems Biology Course (Spring Semester): The SB-CGC team taught how to use the SB-CGC for RNA/protein CWL pipelines. Meghan Farrall Fairbanks, PhD, hosted the classroom work.
    • NCI BTEP Presentation (April 9): The SB-CGC team presented SB-CGC access and analysis options for HTAN imaging and spatial transcriptomic data to the Bioinformatics Training and Education Program of the NCI.  
    • Learn more about collaborative instruction on the SB-CGC website.

    Collaboration & Public Engagement

  • ISB Cancer Gateway in 
the Cloud

    Updated Features
     

    • BigQuery Table Search improvements
      • ISB-CGC introduced a new filter to distinguish between versioned tables and those that consistently point to the latest version, simplifying search results.
      • To enhance the system's scalability, backend queries are now executed in SQL, with all ISB-CGC's BigQuery metadata stored and updated in tables.
    • BigQuery Ecosystem-derived data improvements
      • New tables have been developed for miRNA, gene expression, and somatic mutation from recently updated r41 and r42 GDC data. The updated data comes from programs including CPTAC, TCGA, and TARGET.
      • In addition to publishing new GDC (r41, r42) and PDC (V4.7, V4.9) metadata tables, ISB-CGC has expanded the TCGA and TARGET clinical data from supplemental file table offerings. They now provide 117 additional columns for TARGET in a single table and 427 additional columns for TCGA across seven tables.
    • Mitelman Graphical User Interface improvements
      • Data exploration capabilities have been improved with the addition of the data browser. Users can now create Circos plots for data subsets or explore the relationship between gene fusions and morphology as well as topography.

    Data Releases 

    • The Mitelman Database: Quarterly data updates have added more than 1,200 new cases, raising the total number of unique cytogenetic aberrations by nearly 800. Additionally, over 200 new gene fusions involving dozens of previously unreported genes were added.

    Workshops & Educational Engagement