CRDC Components: Updates June 2026
The CRDC team, whether engaged in activities specific to the CRDC Data Commons, NCI Cloud Resource, or CRDC’s Core Services, remains focused on advancing its mission of making data and resources securely accessible to the cancer research community. The team has provided updates.
Data Commons
Core Standards and Services
Cloud Resource
-
Genomic Data Commons (GDC)
EXPLORE
The Genomic Data Commons (GDC) continues to expand its data holdings and enhance data exploration capabilities. With recent updates, the GDC continues to advance integrated cancer research by enabling seamless exploration of genomic, clinical, and imaging data within a unified platform.
New Data
- Adjuvant Lung Cancer Enrichment Marker Identification and Sequencing Trial (ALCHEMIST) (phs001140), which includes genomic and clinical data to support lung cancer research and biomarker discovery.
- Childhood Cancer Data Initiative (CCDI) Molecular Characterization Initiative (MCI) (phs002790)
- Center for Cancer Genomics (CCG) Cancers of Unknown Primary Project (CUPP) (phs001801)
- Refractory Cancers (RC) - Peripheral T-Cell Lymphoma (PTCL) study (phs002097)
- WGS and RNA-Seq data for ovarian cancer from the Applied Proteogenomics Organizational Learning and Outcomes (APOLLO) network
Features and Enhancements
- GeneCards annotations are now available directly within GDC Gene Summary pages.
- These provide integrated information on gene function, expression, genomic location, and disease associations.
ANALYZE
- Correlation Plot Tool
- Enables users to explore relationships between genomic data (e.g., mutations, copy number variation, gene expression) and clinical outcomes.
- Supports custom correlations and comparisons across disease types, survival outcomes, and genomic features.
- Facilitates biomarker discovery, hypothesis generation, and exploration of disease mechanisms.
- For more detail, see The Genomic Data Commons at the Ten Year Mark.
- Correlation Plot Tool
- Correlation Plot Documentation
- Imaging Data Commons (IDC) Viewer Integration
- Allows users to visualize histopathology and radiology images for cohorts that overlap with IDC.
- Supports viewing of pathology slides and radiology images using IDC viewers.
- For more detail, see The Genomic Data Commons at the Ten Year Mark.
- IDC Viewer App
- IDC Viewer Documentation
- Upcoming Webinar: Explore Cancer Imaging and Genomics with the New GDC Imaging Data Commons (IDC) Image Viewer Tool and IDC Viewers (September 9, 2026)
SUBMIT
The GDC continues to accept data submissions.
- Data Dictionary Update (May 2026)
- Includes new properties from the Participant Engagement & Cancer Genome Sequencing (PE-CGS) initiative.
- Additional fields and values were added for the Clinical Proteomic Tumor Analysis Consortium (CPTAC) and other programs.
RESOURCES
- Upcoming Webinar: Explore Cancer Imaging and Genomics with the New GDC Imaging Data Commons (IDC) Image Viewer Tool and IDC Viewers (September 9, 2026)
- Webinars and support resources are available on the GDC website.
-
Proteomic Data Commons (PDC)
EXPLORE
Since November 2025, the Proteomic Data Commons (PDC) has added more than 16,000 files and nearly 9.5 TB of data.
New Data
Clinical Proteomic Tumor Analysis Consortium (CPTAC)
- New datasets for stomach adenocarcinoma (STAD) and acute myeloid leukemia (AML)
- Expanded multi-omics data types, including proteome, phosphoproteome, acetylome, glycoproteome, metabolome, lipidome, and protein-protein interaction data.
Human Cancer Models Initiative (HCMI)
- New organoid model datasets from the CPTAC-Human Cancer Models Initiative (HCMI), including proteome, phosphoproteome, acetylome, and ubiquitylome data
- These data support multi-omics research using organoid cancer models.
International Cancer Proteogenome Consortium (ICPC)
- New datasets for triple-negative breast cancer (TNBC) and gastric cancer
- Updated gallbladder cancer datasets, including corrected and replaced files
Childhood Cancer Data Initiative (CCDI) / CPTAC-Kids First-CCDI
- New pediatric cancer datasets, including acute myeloid leukemia (AML) and T-cell acute lymphoblastic leukemia (T-ALL)
Children’s Brain Tumor Network (CBTN)
- New Pediatric Brain Tumor Atlas (CBTN) glioma datasets spanning pediatric, adolescent, and young adult populations, with proteome, phosphoproteome, and glycoproteome data
- These additions expand multi-omics cancer research by providing detailed protein and post-translational modification data for pediatric gliomas
Updated Study Data and Metadata
- Clinical data updates across multiple CPTAC cancer types, including renal cell carcinoma, glioblastoma, lung, pancreatic, and uterine cancers
- Updates include patient outcomes, diagnosis, treatment, and follow-up data
- CPTAC3 Discovery and Confirmatory studies
- Added protein quantitation data and heatmaps for STAD datasets
- Added Publication information for STAD datasets
- Cross-resource integration
- Added case-level links to imaging data in The Cancer Imaging Archive (TCIA) for multiple cohorts (e.g., TCGA-BRCA, TCGA-OV, TCGA-COAD)
Platform and User Experience Updates
- Improvements to the Explore page for more accurate filtering, counts, and sorting
- Enhanced documentation, FAQs, and data submission guidance
- Continued updates to support data model changes and harmonization
- Release notes are available for January–April 2026 data releases (V5.1.1, V5.2, V5.3,V5.4, V6.0, V6.01, V6.02, V6.1) and software releases (V4.0.5, V4.0.6, V4.0.7, V4.0.8, V4.0.9, V4.0.10, V4.0.11, V4.0.12, V4.0.13, V4.0.14).
ANALYZE
- Continued support for the Common Data Analysis Pipeline (CDAP) for data-independent acquisition (DIA) studies
- Ongoing improvements to data processing, quality control reporting, and harmonization
- Updated bulk download scripts and documentation to support large-scale data access and analysis
RESOURCES
-
Imaging Data Commons (IDC)
The Imaging Data Commons (IDC) released updates in 2026 to make it easier to find, explore, and analyze large cancer imaging datasets. These updates include new AI tools, training resources, and collaborations.
EXPLORE
Whole-Slide Similarity Search Now Possible Across IDC
- Image similarity search is now supported by using embeddings generated from more than 65,000 IDC digital pathology slides by Google Health AI. These precomputed image features are indexed in BigQuery, allowing users to quickly identify slides with similar visual or biological characteristics across the IDC archive.
Growing Imaging Data Collection
- IDC continues to expand, with nearly 100 TB (99.3 TB) of imaging data across 176 collections and more than 1 million publicly available DICOM series.
- IDC Release 24 added 15 new collections and approximately 5.7 TB of new data, including major additions in digital pathology and radiology imaging.
New Digital Pathology Collections
- New pathology collections from the Genomic Data Commons (GDC), including Burkitt lymphoma, diffuse large B-cell lymphoma, cervical cancer, lung cancer, and Human Cancer Models Initiative (HCMI) datasets
- New canine cancer pathology data from the CATCH collection
- New Patient-Derived Xenograft Network (PDXNet) pathology collections
- Additional Human Tumor Atlas Network (HTAN) multiplexed fluorescence slide microscopy data
New Radiology Collections
- Imaging from the NCI-MATCH precision oncology trial (EAY131), including CT, MR, and PET imaging with associated tumor annotations
- Low-dose CT imaging and projection datasets for imaging algorithm development
- PSMA PET/CT lesion imaging for prostate cancer research
- Spinal multiple myeloma imaging with expert segmentations
- CPTAC stomach adenocarcinoma (STAD) imaging collections
- New mouse PET/CT imaging collection supporting non-small cell lung cancer (NSCLC) research
Updated Collections
- Updated BoneMarrowWSI-PediatricLeukemia annotations to improve cell-level pathology labeling accuracy
ANALYZE
- IDC Claude Skill (Ask Questions in Plain Language)
- A new tool lets users interact with IDC data by asking questions in plain language. For example, users can search for datasets, find specific types of images, check data use terms, or generate download instructions.
- AI-Ready Image Data (Embeddings)
- Google Health AI created and shared precomputed image features (called embeddings) for more than 65,000 pathology slides. These make it much faster to search and analyze images and are freely available for researchers to use.
- Step-by-Step Tutorials for Image Analysis
- New tutorials help users get started with analyzing IDC images using open-source tools. They walk through common tasks such as preparing images, selecting regions to study, and identifying cells or patterns.
SUBMIT
IDC continues to accept imaging data submissions and supports standard formats to ensure data can be shared and used easily across studies.
All slide microscopy data released in IDC Release 24 were harmonized into DICOM from vendor-specific formats, supporting interoperability and consistent access across imaging collections.
RESOURCES
- Upcoming Webinar: Explore Cancer Imaging and Genomics with the New GDC Imaging Data Commons (IDC) Image Viewer Tool and IDC Viewers (September 9, 2026)
- IDC Image Viewers
- TIAToolbox tutorials for IDC data
- IDC Claude skill
- IDC Tutorials
- IDC Portal
COLLABORATION
IDC is working with partners such as Google Health AI to develop tools that make large imaging datasets easier to use. These efforts focus on improving search, analysis, and access to imaging data for the research community.
IDC also continues collaborations with TCIA, GDC, HTAN, PDXNet, and CIRP to expand access to harmonized imaging collections supporting cancer research, AI development, and multimodal analysis.
-
Integrated Canine Data Commons (ICDC)
EXPLORE
Data Releases
The ICDC has released 114 new NanoString nCounter Canine IO Analysis files distributed across two osteosarcoma studies led by Dr. Amy K. LeBlanc, with the NCI Comparative Oncology Program. These include:
- COTC021 (Rapamycin + standard-of-care treatment trial, 152 subjects): 70 new files
- COTC022 (Contemporaneous standard-of-care control cohort, 157 subjects): 44 new files
The original datasets for both studies were limited to primary tumor samples. This expansion adds data from primary and predominantly metastatic lesions, a rare and high-value data type for researchers studying how osteosarcoma spreads and how it responds to treatment.
Both study pages have also been updated with the new Human Relevance tab, providing narrative context on how these canine osteosarcoma findings connect to human disease, including relevant genes, biological pathways, and therapeutic interventions. All of these files are open access. No login or data access approval is required.
New Features and Enhancements
The ICDC released software version 4.3.0 in April 2026, with several enhancements:
- Human Relevance Information
A new Human Relevance tab on each Study Details page explains, in plain language, how each canine cancer study relates to human cancer research. This information is also highlighted on the home page to help users quickly understand the value of comparative oncology.
- Data Model Navigator Updates
The Data Model Navigator now includes additional study-level information, with fields that describe a study’s relevance to human cancer. It also shows version history, allowing users to track how the data model has changed over time and identify which version applies to their work.
- Improved Navigation and Design
Updates to the home page and its navigation bar, as well as the Study Details pages, make the site easier to use and align with NCI design standards.
SUBMIT
ICDC continues to support data submission. That process is integrated with the CRDC Submission Portal. Learn more on the Submit Data page.
RESOURCES
- Official ICDC Release Notes can be found on GitHub.
-
General Commons
EXPLORE
Since the last issue of the biannual CRDC Insights newsletter in November 2025, the CRDC General Commons (GC) has released multiple new datasets spanning clinical, imaging, genomic, population health, and nanotechnology research.
New Data
- Large multi-study imaging collection
- A collection of 36 studies with more than 9,000 participants and over 85,000 files. This includes imaging datasets for glioblastoma, head and neck cancers, sarcoma, melanoma, among other cancer types, and it supports research in radiomics, treatment response, and AI model development.
- Gabriella Miller Kids First (GMKF) Pediatric Research Program (Release 25.0)
- New datasets focused on pediatric cancers, including Ewing sarcoma, osteosarcoma, and hematopoietic malignancies. This includes genomic, clinical, and familial risk data.
- Childhood Cancer Data Initiative (CCDI) datasets (Release 26.0)
- Pediatric datasets for sarcoma, kidney, and liver cancers. This includes clinical and experimental data from the Pediatric In Vivo Testing Program.
- Health Information National Trends Survey (HINTS) datasets (Releases 22.0 and 26.0)
- Multiple survey cycles and linkage datasets that provide population-level data on health communication, behaviors, and cancer-related knowledge.
- Nanotechnology data (Release 24.0)
- Integration of legacy caNano data, including annotated nanomaterials with physicochemical and biological characterizations supported by an updated GC data model.
Explore the detailed list of new datasets.
The best way to stay up to date on GC data releases is by reviewing the Release Notes.
ANALYZE
- Nanotechnology Data Search and Filtering (caNano)
- New structured categories (e.g., protocols, publications, composition) and advanced filters. Users can search, filter, and download protocol data linked to DOIs.
- Improved Visualization and Interaction
- Updated chart behavior and backend query improvements. This supports more interactive and accurate data exploration.
- Research Supported
- Enables cancer nanotechnology research and integration with broader cancer datasets.
SUBMIT
GC continues to support data submission and has introduced improvements that enhance data consistency, traceability, and reuse. This process is integrated with the CRDC Submission Portal.
- Improved data consistency and traceability: Backend updates improve how participant and sample data are linked and tracked.
- DOI-based identifiers for open-access data: Supports better data citation, sharing, and reuse.
RESOURCES
- Large multi-study imaging collection
-
Clinical and Translational Data Commons
EXPLORE
Data
The CTDC recently added 1,864 radiology images spanning 129 participants to the existing Cancer Moonshot Biobank (CMB) study dataset. All newly released files are immediately available for download directly within the CTDC application or can be exported and analyzed through the Seven Bridges Cancer Genomics Cloud (SB-CGC), powered by Velsera and funded by the NCI.
New Features
CTDC released several new features to improve how users find, view, and understand data.
- Clinical data for a study is now displayed on the Study Details page in a dedicated Clinical Data tab.
- The Study Accession property, a unique identifier assigned to each study by dbGaP, is now visible within both the Explore Dashboard and the Study Details page, making it easier to reference and cite studies.
- Clicking a node within the Data Model Navigator now highlights the full path connecting that node back through its parent nodes to the root of the data model, helping users understand how data elements relate to one another at a glance.
SUBMIT
CTDC accepts data submissions. This process is integrated with the CRDC Submission Portal. Read more on the Submit Data page.
RESOURCES
CTDC Release Notes are available through the CTDC portal.
-
Cancer Data Aggregator
EXPLORE
The CDA routinely retrieves data from select CRDC data commons to support aggregated search capabilities, which allows users to query data across CRDC’s data commons using simple text-based search. The platform leverages mapped term synonyms and standardized categories (“slims”) to simplify discovery across diverse datasets.
Over the last several months, the CDA has improved data annotations to enhance users’ ability to explore and understand data by enabling:
- Linking and display of data provenance
- Use of term synonyms and standardized categories
- Access to harmonization mappings across data commons
ANALYZE
Integration with platforms such as ISB-CGC allows users to move from search to analysis workflows, supporting cross-domain research and large-scale data exploration.
RESOURCES
The CDA website provides example notebooks demonstrating how to use CDA for data discovery and analysis, including: From Search to Analysis: Multi-omics Cohort Building with Cancer Data Aggregator. This tutorial highlights how to combine CDA and CRDC BigQuery tables, powered by ISB-CGC, for advanced search and analysis workflows. Tutorials are also available on the CDA YouTube channel.
For questions, demo requests, and feedback, contact CancerDataAggregator@gmail.com or the CDA Helpdesk.
-
Data Commons Framework
EXPLORE
The Data Commons Framework (DCF) enables streamlined access to Cancer Research Data Commons (CRDC) data through consistent indexing for users. It also facilitates secure access to both open- and controlled-access datasets, ensuring that users have the appropriate permissions for each.
Data Indexing
Over the last six months, the DCF has processed and indexed more than 1.3 PB of data, including Genomic Data Commons (GDC) Data Releases 44 and 45.
Software Updates and Enhancements
- Improved Data Access and Cost Efficiency
- The DCF team worked with sponsors and data partners to retire the Google Cloud Platform (GCP) copy of GDC data while maintaining uninterrupted access through Amazon S3 locations. These changes significantly reduced operational costs.
- Support for Consent Codes
- Recent updates to the DCF now support enforcement of participant consent codes, helping ensure data is used in accordance with participant permissions and preferences. This enhancement has been implemented across all CRDC-hosted datasets, including Kids First, the Human Tumor Atlas Network (HTAN), and the Childhood Cancer Data Initiative (CCDI), further strengthening responsible data stewardship.
RESOURCES
- Improved Data Access and Cost Efficiency
-
CRDC Submission Portal
The CRDC Submission Portal marks two years since its launch. A story about the CRDC Submission Portal with comments from various users is included in the June issue of CRDC Insights. {EMBED LINK once it’s live}.
Here we highlight enhancements made to the portal during the first part of this year.
SUBMIT
Submission Request Features that Make the Submission Process Even Easier
- Users can now search for submission request forms using either the study name or abbreviation. These are in addition to search terms including submitter name, program, and request status.
- Users who upload the Excel-based Submission Request Form (SRF) on the portal can now benefit from a new feature that proactively flags errors, such as required fields left empty. This allows users to address identified errors early on and submit a complete submission request form.
Data Submission Features that Streamline the Process
Managing Existing Submissions
- Users can now search for data submissions by study name, acronym, or dbGaP PHS accession number.
- Users can access approved Submission Request Forms associated with a specific submission directly from the Data Submissions page, making it easier to review those details all in one space.
- Users can reference the Population Science Data Commons (PSDC) data model, which is now available on the CRDC Submission Portal, under the Model Navigator.
- Users now have the flexibility to download individual submission templates as needed.
Validation Improvements
- Validation results are now grouped by error type, so they are more easily interpreted.
- When deleting metadata, users can now choose whether associated files should also be deleted.
- Submissions can now be validated against preliminary Common Data Elements (CDEs) to keep the submission process going without having to wait for the CDEs to be finalized.
-
Seven Bridges Cancer Genomics Cloud
EXPLORE
The Seven Bridges Cancer Genomics Cloud (SB-CGC) platform, powered by Velsera, provides access to most CRDC datasets and regularly updates its indexing of Genomic Data Commons (GDC) and Proteomic Data Commons (PDC) content. Recent indexing updates include PDC versions 4.14–5.0, 5.2, 5.3, and 5.4. Newly indexed datasets provide users with richer data exploration and analytics through the SB-CGC.
Updates to enhance data exploration on its platform include:
- A streamlined process to retrieve controlled-access study permissions
- File browser improvements
A detailed description of these improvements is available.
ANALYZE
Several new bioinformatics tools and features have been added to SB-CGC’s public apps gallery. These enhance researchers' ability to conduct reproducible, scalable, and accessible analyses of large-scale genomic, multi-omics, and imaging data. A summary is below:
- DeepVariant & Giraffe-DeepVariant – Major version updates improve small-variant calling across multiple sequencing technologies.
- genotype GVCFs & Filter Variants workflow – Upgraded to GATK 4.6.2.0.
- sbmanifest – A lightweight Common Workflow Language (CWL), which generates and validates sample sheets that are required for Nextflow workflows.
- sbpack_nf – A CWL tool that creates a CWL wrapper for Nextflow workflows, which allows researchers to package and run their Nextflow pipelines directly on the SB-CGC platform without manual configuration.
- DRS Bulk Import & File Upload improvements – The Data Repository Service (DRS) bulk import has been improved with more readable metadata and tooltips.
- Enhanced Data Studio Session Stability – This improves shutdown and backup reliability for Data Studio, the SB-CGC’s interactive analysis environment, ensuring large and complex sessions complete backups and clean-up before shutdown.
- Integrated Visualization & Annotation Tools – Three imaging graphical user interface tools (GUI) are available in the SB-CGC’s Data Studio, including OHIF Viewer, 3D Slicer and ImageJ. This update significantly expands the SB-CGC’s Data Studio beyond code-based tools like JupyterLab and RStudio, making it easier to work with complex imaging data.
- New machine-learning Tools for Multi-omics Integration and Imaging Analysis – These include tools from an open-source framework for deep learning (MONAI), as well as tools from an unsupervised multi-omics integration framework (MOFA2).
A more detailed update on the SB-CGC's new Analytical Tools is available.
Learn more about SB-CGC's Analytical Tools and Resources on its website.
RESOURCES
- Past webinar recordings are available on the SB-CGC website.
- Office Hours are held weekly at the times below. All are welcome to join:
- Tuesdays at 10:00 am ET and Thursdays at 2:00 pm ET
- Join here