CRDC Insights

Updates from the Cancer Research Data Commons:
Empowering the Scientific Community to Make New Discoveries

CRDC at the AACR 2026 Annual Meeting: Artificial Intelligence (AI) Enablement

June 15, 2026
AACR 2026 Annual mtg thumbnail

The NCI/NIH-sponsored session at the AACR 2026 Annual Meeting, "From Data Commons to Knowledge Engine: AI Enablement Across CRDC and CCDI," drew a strong audience. The session included updates spanning the CRDC ecosystem, the Childhood Cancer Data Initiative (CCDI), and the ARPA-H Biomedical Data Fabric (BDF) Toolbox program. Speakers from NCI's Center for Biomedical Informatics and Information Technology (CBIIT) and Office of Data Sharing (ODS) discussed advances in artificial intelligence (AI) enablement across the Cancer Research Data Commons (CRDC).

Tanja Davidsen, PhD, from NCI’s Center for Biomedical Informatics and Information Technology (CBIIT), opened the session with an overview of the CRDC as NCI's primary data science platform for cancer research. With 533 studies, more than 200,000 subjects, over 17 petabytes of data, and more than 100,000 users per month, the CRDC has grown into a cornerstone of open cancer science. Its seven specialized data commons support FAIR (Findable, Accessible, Interoperable, and Reusable) data principles across genomic, clinical, imaging, and population-scale data types.

Dr. Davidsen highlighted key infrastructure advances, including the Cancer Data Aggregator (CDA), which serves as an application programming interface (API) layer for querying and aggregating data across CRDC repositories, and the Data Commons Framework, which provides modular, reusable components for authentication, indexing, and cloud storage. The CRDC's participation in the NIH Cloud Platform Interoperability (NCPI) initiative continues to expand connectivity with platforms such as the National Human Genome Research Institute's (NHGRI) Analysis, Visualization, and Informatics Lab-space (AnVIL), NHLBI’s BioData Catalyst, and Kids First, broadening the reach of cancer data across the NIH ecosystem. 

Subhashini Jagu, PhD, from NCI's Office of Data Sharing, presented on the CCDI and its mission to collect and integrate data from every child, adolescent, and young adult diagnosed with cancer. A key highlight was the Molecular Characterization Initiative (MCI), a partnership with the Children's Oncology Group's Project: EveryChild that has now enrolled more than 9,000 participants. The MCI provides state-of-the-art molecular characterization at no cost to participants, with results returned to physicians and patients within 21 days.

De-identified data from molecular assays (enhanced exome sequencing, targeted RNA fusion sequencing, and methylation array analysis) and associated clinical data (demographic, diagnostic, pathology, treatment, and follow-up information) make up this valuable dataset. Digital histopathology images are made openly accessible through the CRDC Imaging Data Commons (IDC), while processed genomic characterization data are available through the CCDI cBioPortal Cancer Data Explorer and the Genomic Data Commons (GDC).

Erika Kim, PhD, presented on the ARPA-H Biomedical Data Fabric (BDF) Toolbox program, which is focused on developing next-generation tools to make health research data easier to synthesize and use. Several tools were showcased during the presentation, including:

  • AI-assisted Data Curation (Netrias): An LLM-based harmonization tool that demonstrated a 90% reduction in curation time and surfaced 15% more usable data. The tool is now being evaluated for the CRDC Submission Portal.  
  • BioInsight AI (ICF): An AI-powered chat interface enabling plain-language queries across the CRDC, the NCI’s Medical Imaging and Data Resource Center (MIDRC), ProteomeXchange, and other biomedical repositories.  
  • Beaker & BeakerHub (Jataware): An AI-augmented notebook platform with agents connected to CRDC Data Commons, including the GDC, Proteomic Data Commons (PDC), and IDC, as well as cBioPortal and the CDA.  
  • INDRA (Northeastern University): A knowledge assembly platform that extracts causal mechanisms from tens of millions of publications and combines them with bio-ontologies and drug databases to support hypothesis generation.

Taken together, the session demonstrated the power of collaboration among the CRDC, CCDI, and BDF programs. It illustrated how the CRDC is evolving from a robust data repository into an AI-enabled knowledge engine, providing tools that lower barriers to discovery for researchers at every computational level.