CRDC Insights

Updates from the Cancer Research Data Commons:
Empowering the Scientific Community to Make New Discoveries

CRDC Components: Updates June 2026

June 15, 2026

The CRDC team, whether engaged in activities specific to the CRDC Data Commons, NCI Cloud Resource, or CRDC’s Core Services, remains focused on advancing its mission of making data and resources securely accessible to the cancer research community. The team has provided updates. 

  • Genomic Data Commons (GDC)

    EXPLORE

    The Genomic Data Commons (GDC) continues to expand its data holdings and enhance data exploration capabilities. With recent updates, the GDC continues to advance integrated cancer research by enabling seamless exploration of genomic, clinical, and imaging data within a unified platform.

    New Data

    • Adjuvant Lung Cancer Enrichment Marker Identification and Sequencing Trial (ALCHEMIST) (phs001140), which includes genomic and clinical data to support lung cancer research and biomarker discovery.
    • Childhood Cancer Data Initiative (CCDI) Molecular Characterization Initiative (MCI) (phs002790)
    • Center for Cancer Genomics (CCG) Cancers of Unknown Primary Project (CUPP) (phs001801)
    • Refractory Cancers (RC) - Peripheral T-Cell Lymphoma (PTCL) study (phs002097)
    • WGS and RNA-Seq data for ovarian cancer from the Applied Proteogenomics Organizational Learning and Outcomes (APOLLO) network

    Features and Enhancements

    • GeneCards annotations are now available directly within GDC Gene Summary pages.
      • These provide integrated information on gene function, expression, genomic location, and disease associations. 

    ANALYZE

    SUBMIT

    The GDC continues to accept data submissions.

    • Data Dictionary Update (May 2026)
      • Includes new properties from the Participant Engagement & Cancer Genome Sequencing (PE-CGS) initiative.
      • Additional fields and values were added for the Clinical Proteomic Tumor Analysis Consortium (CPTAC) and other programs.

    RESOURCES

  • Proteomic Data Commons (PDC)

    EXPLORE

    Since November 2025, the Proteomic Data Commons (PDC) has added more than 16,000 files and nearly 9.5 TB of data.

    New Data

    Clinical Proteomic Tumor Analysis Consortium (CPTAC)

    • New datasets for stomach adenocarcinoma (STAD) and acute myeloid leukemia (AML)
    • Expanded multi-omics data types, including proteome, phosphoproteome, acetylome, glycoproteome, metabolome, lipidome, and protein-protein interaction data.

    Human Cancer Models Initiative (HCMI)

    • New organoid model datasets from the CPTAC-Human Cancer Models Initiative (HCMI), including proteome, phosphoproteome, acetylome, and ubiquitylome data
    • These data support multi-omics research using organoid cancer models.

    International Cancer Proteogenome Consortium (ICPC)

    • New datasets for triple-negative breast cancer (TNBC) and gastric cancer
    • Updated gallbladder cancer datasets, including corrected and replaced files

    Childhood Cancer Data Initiative (CCDI) / CPTAC-Kids First-CCDI

    • New pediatric cancer datasets, including acute myeloid leukemia (AML) and T-cell acute lymphoblastic leukemia (T-ALL)

    Children’s Brain Tumor Network (CBTN)

    • New Pediatric Brain Tumor Atlas (CBTN) glioma datasets spanning pediatric, adolescent, and young adult populations, with proteome, phosphoproteome, and glycoproteome data
    • These additions expand multi-omics cancer research by providing detailed protein and post-translational modification data for pediatric gliomas

    Updated Study Data and Metadata

    • Clinical data updates across multiple CPTAC cancer types, including renal cell carcinoma, glioblastoma, lung, pancreatic, and uterine cancers
      • Updates include patient outcomes, diagnosis, treatment, and follow-up data
    • CPTAC3 Discovery and Confirmatory studies
      • Added protein quantitation data and heatmaps for STAD datasets
      •  Added Publication information for STAD datasets 
    • Cross-resource integration
      • Added case-level links to imaging data in The Cancer Imaging Archive (TCIA) for multiple cohorts (e.g., TCGA-BRCA, TCGA-OV, TCGA-COAD)

    Platform and User Experience Updates

    • Improvements to the Explore page for more accurate filtering, counts, and sorting
    • Enhanced documentation, FAQs, and data submission guidance
    • Continued updates to support data model changes and harmonization
    • Release notes are available for January–April 2026 data releases (V5.1.1, V5.2, V5.3,V5.4, V6.0, V6.01, V6.02, V6.1) and software releases (V4.0.5, V4.0.6, V4.0.7, V4.0.8, V4.0.9, V4.0.10, V4.0.11, V4.0.12, V4.0.13, V4.0.14).

    ANALYZE

    • Continued support for the Common Data Analysis Pipeline (CDAP) for data-independent acquisition (DIA) studies
    • Ongoing improvements to data processing, quality control reporting, and harmonization
    • Updated bulk download scripts and documentation to support large-scale data access and analysis

    RESOURCES

  • Imaging Data Commons (IDC)

    The Imaging Data Commons (IDC) released updates in 2026 to make it easier to find, explore, and analyze large cancer imaging datasets. These updates include new AI tools, training resources, and collaborations. 

    EXPLORE

    Whole-Slide Similarity Search Now Possible Across IDC 

    • Image similarity search is now supported by using embeddings generated from more than 65,000 IDC digital pathology slides by Google Health AI. These precomputed image features are indexed in BigQuery, allowing users to quickly identify slides with similar visual or biological characteristics across the IDC archive.

    Growing Imaging Data Collection

    • IDC continues to expand, with nearly 100 TB (99.3 TB) of imaging data across 176 collections and more than 1 million publicly available DICOM series.
    • IDC Release 24 added 15 new collections and approximately 5.7 TB of new data, including major additions in digital pathology and radiology imaging.

    New Digital Pathology Collections

    • New pathology collections from the Genomic Data Commons (GDC), including Burkitt lymphoma, diffuse large B-cell lymphoma, cervical cancer, lung cancer, and Human Cancer Models Initiative (HCMI) datasets
    • New canine cancer pathology data from the CATCH collection 
    • New Patient-Derived Xenograft Network (PDXNet) pathology collections
    • Additional Human Tumor Atlas Network (HTAN) multiplexed fluorescence slide microscopy data

    New Radiology Collections

    • Imaging from the NCI-MATCH precision oncology trial (EAY131), including CT, MR, and PET imaging with associated tumor annotations
    • Low-dose CT imaging and projection datasets for imaging algorithm development
    • PSMA PET/CT lesion imaging for prostate cancer research
    • Spinal multiple myeloma imaging with expert segmentations
    • CPTAC stomach adenocarcinoma (STAD) imaging collections
    • New mouse PET/CT imaging collection supporting non-small cell lung cancer (NSCLC) research

    Updated Collections

    • Updated BoneMarrowWSI-PediatricLeukemia annotations to improve cell-level pathology labeling accuracy

    ANALYZE

    • IDC Claude Skill (Ask Questions in Plain Language)
      • A new tool lets users interact with IDC data by asking questions in plain language. For example, users can search for datasets, find specific types of images, check data use terms, or generate download instructions.
    • AI-Ready Image Data (Embeddings) 
      • Google Health AI created and shared precomputed image features (called embeddings) for more than 65,000 pathology slides. These make it much faster to search and analyze images and are freely available for researchers to use.
    • Step-by-Step Tutorials for Image Analysis
      • New tutorials help users get started with analyzing IDC images using open-source tools. They walk through common tasks such as preparing images, selecting regions to study, and identifying cells or patterns.

    SUBMIT

    IDC continues to accept imaging data submissions and supports standard formats to ensure data can be shared and used easily across studies.

    All slide microscopy data released in IDC Release 24 were harmonized into DICOM from vendor-specific formats, supporting interoperability and consistent access across imaging collections.

    RESOURCES

    COLLABORATION

    IDC is working with partners such as Google Health AI to develop tools that make large imaging datasets easier to use. These efforts focus on improving search, analysis, and access to imaging data for the research community.

    IDC also continues collaborations with TCIA, GDC, HTAN, PDXNet, and CIRP to expand access to harmonized imaging collections supporting cancer research, AI development, and multimodal analysis.

  • Integrated Canine Data Commons (ICDC)

    EXPLORE

    Data Releases

    The ICDC has released 114 new NanoString nCounter Canine IO Analysis files distributed across two osteosarcoma studies led by Dr. Amy K. LeBlanc, with the NCI Comparative Oncology Program. These include:  

    • COTC021 (Rapamycin + standard-of-care treatment trial, 152 subjects): 70 new files
    • COTC022 (Contemporaneous standard-of-care control cohort, 157 subjects): 44 new files

    The original datasets for both studies were limited to primary tumor samples. This expansion adds data from primary and predominantly metastatic lesions, a rare and high-value data type for researchers studying how osteosarcoma spreads and how it responds to treatment.

    Both study pages have also been updated with the new Human Relevance tab, providing narrative context on how these canine osteosarcoma findings connect to human disease, including relevant genes, biological pathways, and therapeutic interventions. All of these files are open access. No login or data access approval is required.

    New Features and Enhancements

    The ICDC released software version 4.3.0 in April 2026, with several enhancements:

    • Human Relevance Information
      A new Human Relevance tab on each Study Details page explains, in plain language, how each canine cancer study relates to human cancer research. This information is also highlighted on the home page to help users quickly understand the value of comparative oncology.
       
    • Data Model Navigator Updates
      The Data Model Navigator now includes additional study-level information, with fields that describe a study’s relevance to human cancer. It also shows version history, allowing users to track how the data model has changed over time and identify which version applies to their work.
       
    • Improved Navigation and Design
      Updates to the home page and its navigation bar, as well as the Study Details pages, make the site easier to use and align with NCI design standards.

    SUBMIT

    ICDC continues to support data submission. That process is integrated with the CRDC Submission Portal. Learn more on the Submit Data page.

    RESOURCES

  • General Commons

    EXPLORE

    Since the last issue of the biannual CRDC Insights newsletter in November 2025, the CRDC General Commons (GC) has released multiple new datasets spanning clinical, imaging, genomic, population health, and nanotechnology research.

    New Data

    • Large multi-study imaging collection
      • A collection of 36 studies with more than 9,000 participants and over 85,000 files. This includes imaging datasets for glioblastoma, head and neck cancers, sarcoma, melanoma, among other cancer types, and it supports research in radiomics, treatment response, and AI model development.
    • Gabriella Miller Kids First (GMKF) Pediatric Research Program (Release 25.0)
      • New datasets focused on pediatric cancers, including Ewing sarcoma, osteosarcoma, and hematopoietic malignancies. This includes genomic, clinical, and familial risk data.
    • Childhood Cancer Data Initiative (CCDI) datasets (Release 26.0)
      • Pediatric datasets for sarcoma, kidney, and liver cancers. This includes clinical and experimental data from the Pediatric In Vivo Testing Program.
    • Health Information National Trends Survey (HINTS) datasets (Releases 22.0 and 26.0)
      • Multiple survey cycles and linkage datasets that provide population-level data on health communication, behaviors, and cancer-related knowledge.
    • Nanotechnology data (Release 24.0)
      • Integration of legacy caNano data, including annotated nanomaterials with physicochemical and biological characterizations supported by an updated GC data model.

    Explore the detailed list of new datasets.

    The best way to stay up to date on GC data releases is by reviewing the Release Notes

    ANALYZE

    • Nanotechnology Data Search and Filtering (caNano)
      • New structured categories (e.g., protocols, publications, composition) and advanced filters. Users can search, filter, and download protocol data linked to DOIs.
    • Improved Visualization and Interaction
      • Updated chart behavior and backend query improvements. This supports more interactive and accurate data exploration.
    • Research Supported 
      • Enables cancer nanotechnology research and integration with broader cancer datasets.

    SUBMIT

    GC continues to support data submission and has introduced improvements that enhance data consistency, traceability, and reuse. This process is integrated with the CRDC Submission Portal.

    • Improved data consistency and traceability: Backend updates improve how participant and sample data are linked and tracked.
    • DOI-based identifiers for open-access data: Supports better data citation, sharing, and reuse. 

    RESOURCES

  • Clinical and Translational Data Commons

    EXPLORE

    Data

    The CTDC recently added 1,864 radiology images spanning 129 participants to the existing Cancer Moonshot Biobank (CMB) study dataset. All newly released files are immediately available for download directly within the CTDC application or can be exported and analyzed through the Seven Bridges Cancer Genomics Cloud (SB-CGC), powered by Velsera and funded by the NCI.

    New Features 

    CTDC released several new features to improve how users find, view, and understand data.

    • Clinical data for a study is now displayed on the Study Details page in a dedicated Clinical Data tab.
    • The Study Accession property, a unique identifier assigned to each study by dbGaP, is now visible within both the Explore Dashboard and the Study Details page, making it easier to reference and cite studies.
    • Clicking a node within the Data Model Navigator now highlights the full path connecting that node back through its parent nodes to the root of the data model, helping users understand how data elements relate to one another at a glance.

    SUBMIT

    CTDC accepts data submissions. This process is integrated with the CRDC Submission Portal. Read more on the Submit Data page.

    RESOURCES

    CTDC Release Notes are available through the CTDC portal.

  • Cancer Data Aggregator

    EXPLORE

    The CDA routinely retrieves data from select CRDC data commons to support aggregated search capabilities, which allows users to query data across CRDC’s data commons using simple text-based search. The platform leverages mapped term synonyms and standardized categories (“slims”) to simplify discovery across diverse datasets.

    Over the last several months, the CDA has improved data annotations to enhance users’ ability to explore and understand data by enabling:

    • Linking and display of data provenance
    • Use of term synonyms and standardized categories
    • Access to harmonization mappings across data commons

    ANALYZE

    Integration with platforms such as ISB-CGC allows users to move from search to analysis workflows, supporting cross-domain research and large-scale data exploration.

    RESOURCES 

    The CDA website provides example notebooks demonstrating how to use CDA for data discovery and analysis, including: From Search to Analysis: Multi-omics Cohort Building with Cancer Data Aggregator. This tutorial highlights how to combine CDA and CRDC BigQuery tables, powered by ISB-CGC, for advanced search and analysis workflows. Tutorials are also available on the CDA YouTube channel.

    For questions, demo requests, and feedback, contact CancerDataAggregator@gmail.com or the CDA Helpdesk.

  • Data Commons Framework

    EXPLORE

    The Data Commons Framework (DCF) enables streamlined access to Cancer Research Data Commons (CRDC) data through consistent indexing for users. It also facilitates secure access to both open- and controlled-access datasets, ensuring that users have the appropriate permissions for each.  

    Data Indexing

    Over the last six months, the DCF has processed and indexed more than 1.3 PB of data, including Genomic Data Commons (GDC) Data Releases 44 and 45.

    Software Updates and Enhancements

    • Improved Data Access and Cost Efficiency 
      • The DCF team worked with sponsors and data partners to retire the Google Cloud Platform (GCP) copy of GDC data while maintaining uninterrupted access through Amazon S3 locations. These changes significantly reduced operational costs.
    • Support for Consent Codes
      • Recent updates to the DCF now support enforcement of participant consent codes, helping ensure data is used in accordance with participant permissions and preferences. This enhancement has been implemented across all CRDC-hosted datasets, including Kids First, the Human Tumor Atlas Network (HTAN), and the Childhood Cancer Data Initiative (CCDI), further strengthening responsible data stewardship.

    RESOURCES

  • CRDC Submission Portal

    The CRDC Submission Portal marks two years since its launch. A story about the CRDC Submission Portal with comments from various users is included in the June issue of CRDC Insights.  {EMBED LINK once it’s live}. 

    Here we highlight enhancements made to the portal during the first part of this year. 

    SUBMIT

    Submission Request Features that Make the Submission Process Even Easier 

    • Users can now search for submission request forms using either the study name or abbreviation. These are in addition to search terms including submitter name, program, and request status. 
    • Users who upload the Excel-based Submission Request Form (SRF) on the portal can now benefit from a new feature that proactively flags errors, such as required fields left empty. This allows users to address identified errors early on and submit a complete submission request form.

    Data Submission Features that Streamline the Process

    Managing Existing Submissions

    • Users can now search for data submissions by study name, acronym, or dbGaP PHS accession number. 
    • Users can access approved Submission Request Forms associated with a specific submission directly from the Data Submissions page, making it easier to review those details all in one space.
    • Users can reference the Population Science Data Commons (PSDC) data model, which is now available on the CRDC Submission Portal, under the Model Navigator.
    • Users now have the flexibility to download individual submission templates as needed.

    Validation Improvements

    • Validation results are now grouped by error type, so they are more easily interpreted.  
    • When deleting metadata, users can now choose whether associated files should also be deleted.
    • Submissions can now be validated against preliminary Common Data Elements (CDEs) to keep the submission process going without having to wait for the CDEs to be finalized.
  • Seven Bridges Cancer Genomics Cloud

    EXPLORE

    The Seven Bridges Cancer Genomics Cloud (SB-CGC) platform, powered by Velsera, provides access to most CRDC datasets and regularly updates its indexing of Genomic Data Commons (GDC) and Proteomic Data Commons (PDC) content. Recent indexing updates include PDC versions 4.14–5.0, 5.2, 5.3, and 5.4. Newly indexed datasets provide users with richer data exploration and analytics through the SB-CGC. 

    Updates to enhance data exploration on its platform include:

    • A streamlined process to retrieve controlled-access study permissions
    • File browser improvements

    A detailed description of these improvements is available.

    ANALYZE

    Several new bioinformatics tools and features have been added to SB-CGC’s public apps gallery. These enhance researchers' ability to conduct reproducible, scalable, and accessible analyses of large-scale genomic, multi-omics, and imaging data. A summary is below:  

    • DeepVariant & Giraffe-DeepVariant – Major version updates improve small-variant calling across multiple sequencing technologies. 
       
    • genotype GVCFs & Filter Variants workflow – Upgraded to GATK 4.6.2.0. 
       
    • sbmanifest – A lightweight Common Workflow Language (CWL), which generates and validates sample sheets that are required for Nextflow workflows.
       
    • sbpack_nf – A CWL tool that creates a CWL wrapper for Nextflow workflows,  which allows researchers to package and run their Nextflow pipelines directly on the SB-CGC platform without manual configuration.
       
    • DRS Bulk Import & File Upload improvements – The Data Repository Service (DRS) bulk import has been improved with more readable metadata and tooltips.
       
    • Enhanced Data Studio Session Stability – This improves shutdown and backup reliability for Data Studio, the SB-CGC’s interactive analysis environment, ensuring large and complex sessions complete backups and clean-up before shutdown.
       
    • Integrated Visualization & Annotation Tools – Three imaging graphical user interface tools (GUI) are available in the SB-CGC’s Data Studio, including OHIF Viewer, 3D Slicer and ImageJ. This update significantly expands the SB-CGC’s Data Studio beyond code-based tools like JupyterLab and RStudio, making it easier to work with complex imaging data.
       
    • New machine-learning Tools for Multi-omics Integration and Imaging Analysis – These include tools from an open-source framework for deep learning (MONAI), as well as tools from an unsupervised multi-omics integration framework (MOFA2).

    A more detailed update on the SB-CGC's new Analytical Tools is available. 

    Learn more about SB-CGC's Analytical Tools and Resources on its website.

    RESOURCES

    • Past webinar recordings are available on the SB-CGC website.
    • Office Hours are held weekly at the times below. All are welcome to join:
      • Tuesdays at 10:00 am ET and Thursdays at 2:00 pm ET 
      • Join here