CRDC Insights

Updates from the Cancer Research Data Commons:
Empowering the Scientific Community to Make New Discoveries

Recommendations from NCI CRDC AI Data Readiness Challenge Winners

January 17, 2025
Image
A person's arm holding stylized images of a human head, cog, and scientific symbols

The winners of the recent NCI Cancer Research Data Commons (CRDC) Artificial Intelligence (AI) Data Readiness Challenge presented their findings and recommendations for improving the AI readiness of NCI CRDC data. Access a recording of the webinar.

Recommendations from the winners include, among others:

  • Make CRDC data and metadata more interoperable across all CRDC Data Commons and Cloud Resources with consistent terminologies and schema.

  • Develop a unified exploration portal to make it easier to search and analyze data across all components and to identify multimodal cohorts of patient data that are housed in different data commons (e.g. genomics, proteomics, imaging).

  • Annotate data to note case and control for comparative outcomes research.

  • Diversify the overall data collection to achieve a more equal gender balance, as well as a broader range of race/ethnicity across all cancer types, which is important to avoid unintended biases when training AI models.

Emily Greenspan, Ph.D., organized the challenge in collaboration with several CBIIT colleagues. As she notes: Many of the challenge submissions, including those from the winners, proposed insightful recommendations, several of which the CRDC team is actively addressing, such as recognizing the need to streamline data submission and discovery to facilitate easier search and access. We all recognize the importance of making NCI-funded research data FAIR, and that lays the foundation for making sure it is ready for AI/ML-based research. This is critical given how quickly the research landscape is evolving with the use of these powerful tools.

As Dr. Greenspan continued: There is a great deal of interest in this effort, and those who would like to follow this work may want to join the user group affiliated with the NCI’s Computational Resources for Cancer Research portal: computational.cancer.gov.   

Read more about the original challenge

The winners responded to the challenge with a variety of projects through which they were tasked with applying CRDC data to address a selected Artificial Intelligence/Machine Learning (AI/ML) use case. This required preprocessing the data into a format usable for AI/ML applications and then training the algorithms/models with the transformed data to address the selected AI use case. In conducting their study, users were tasked with making AI data readiness recommendations using metrics such as accessibility, availability, accuracy, completeness, privacy, and diversity.

The winners of the AI Data Readiness Challenge are:  

Tier 1 (training an AI/ML model with single modal data)

  • 1st Place: Jennifer Blasé (Ruvos)
    • Project: “Gene expression-based prediction of treatment response in ovarian cancer”
  • 2nd Place: Agnes McFarlin
    • Project: “Identifying cancerous lung nodules without the presence of annotated slides for reference”

Tier 2 (training an AI/ML model with multi-modal data)

  • 1st Place: Abhishek Jha (Elucidata)
    • Project: “Distinguishing primary tumor from normal solid tissue in lung squamous cell carcinoma”
  • 2nd Place: Jeff Van Oss (BAMF Health)
    • Project: “Predicting Von Hippel-Lindau mutation in kidney tumors using radiomic features”

A full report on the Challenge and all recommendations is in development and will be presented at the CRDC 2024 Symposium in October.