CRDC Insights

Updates from the Cancer Research Data Commons:
Empowering the Scientific Community to Make New Discoveries

CRDC Collaborations: Updates

November 15, 2024
Image
Collaboration icon

 

The CRDC collaborates with many NIH programs, offering its data, resources, and expertise. In this issue: an update on the MIDI-B Challenge, which focused on automated image de-identification. Also in this issue: an update on the Advanced Research Projects Agency for Health (ARPA-H) and the work to create a Biomedical Data Fabric (BDF) Toolbox that will make it easier to share and integrate data, analytical tools, and technologies.

 

 


MIDI-B Challenge Update

NCI launched the Medical Image De-Identification Benchmark (MIDI-B) Challenge in collaboration with Sage Bionetworks and the Medical Imaging Computing and Computer Assisted Interventions (MICCAI) Society in May, 2024. The Challenge addressed the fundamental requirements of protecting patient privacy and preserving the research value of the data for sharing medical images through public data commons. The MIDI-B Challenge ran through early September and assessed various de-identification approaches and methods.  

More than 75 teams from the US and international institutions, including academia and industry, registered for the MIDI-B Challenge. Only ten teams advanced to the test phase and completed the challenge. The list of participating teams and their results are posted on the MIDI-B Challenge leaderboard

Challenge participants used automated methods, including a customized large language model, a natural language model, existing data recognition coupled with DICOM redacting software, and novel pseudonymization using date shifting, hashing, text recognition, and text replacement. Each team’s results were evaluated against answer keys for the validation dataset and the test dataset using a validation script.

Teams that completed the challenge presented their results in a virtual MIDI-B Challenge workshop on October 24, 2024. The MIDI-B Challenge organizers plan to publish a report of the challenge design and results in a peer-reviewed scientific journal. Members from the ten teams that completed the challenge will be co-authors of the report.  


ARPA-H – Biomedical Data Fabric (BDF) Toolbox: Awardees Announced 

The Advanced Research Projects Agency for Health (ARPA-H), in partnership with NCI, is creating the Biomedical Data Fabric (BDF) Toolbox to make it easier to share and integrate data, analytical tools, and technologies from contributors across the country. BDF Toolbox program recently awarded contracts to 17 teams to develop innovative capabilities to provide machine-assisted data curation, automated data harmonization, and intuitive user interfaces to explore rich clinical research data.

The CRDC engages with the ARPA-H BDF Toolbox by providing data, use cases, lessons learned, analytical workflows, and offers a transition data platform to integrate best-in-class tools into its ecosystem.

This effort aims to improve patients’ health outcomes by democratizing access to biomedical data and creating open-source tools with rigorous metrics to ensure the algorithms can help overcome technical barriers to data integration and use. The BDF Toolbox program is beginning with cancer data, with the expectation that the lessons learned will apply to other types of biomedical data.  

“With these new teams in place, we’re one step closer to maximizing the use of biomedical data,” said Erika Kim, Ph.D., Transition and System Integration Lead at ARPA-H and Federal co-Lead for the CRDC.  

Tanja Davidsen, Ph.D., NCI Data Ecosystems Branch Chief, added, “This multidisciplinary effort taps the expertise of teams from academia, nonprofits, and commercial organizations. Together, we can boost our efforts to find new and better ways of using data to improve the way we treat different diseases, starting with cancer.”

Read a summary of this work.  

Read the ARPA-H press release.