CRDC at the 2024 AACR Annual Meeting: Q&A Session

The CRDC team presented at the 2024 AACR Annual Meeting in an NCI-Sponsored Session highlighting the impact of the CRDC to the broad cancer research community.  A Question-and-Answer session followed the presentation, focusing on how the CRDC will facilitate data management and sharing.  

You note that the CRDC will soon be taking structured clinical study data. What about unstructured data? How will the CRDC handle that data? 

This is a work in progress. There are many efforts across HHS to evaluate several ways to extract meaningful data from electronic health records. The CRDC collaboration with ARPA-H in developing a Biomedical Data Fabric Toolbox includes a focus on using AI tools leveraging Large Language Models to address the complexities of extracting data from electronic health records.  

What is the anticipated timeline for taking data from U, P, and R grants?

The CRDC is trying to lower the barriers to submit and share data. Over the last several years, we have been focused on NCI-initiated programs such as TCGA and CPTAC. But with the new NIH Data Management and Sharing policy, we see the need to open the CRDC to submissions from investigator-initiated grants such as R01. Our goal is to make data submission easier for submitters starting with the launch of a new CRDC Data Submission Portal in summer 2024, and piloting several submissions from investigators with U, P, or R grants.

How does the CRDC address international data interoperability issues?

Some data interoperability issues pertain to technology issues, but there also are geo-political issues about privacy and data sharing. From a technical perspective, it is a challenge but there are ways, as in using federating learning models where a query or algorithm is shared, but the data are not. The CCDI – Childhood Cancer Data Initiative – has found a way to operate under the rules of each contributor, which makes it possible to work with research collaborators all over the world. Both technical and geo-political solutions will continue to evolve.

From the perspective of non-profit funders of academic research, are there requirements about data sharing that we can also use?

There is a great deal of interest across the NCI in working with non-profit and advocacy organizations to align how we all are addressing data sharing. NIH took the lead with the Cancer Moonshot to set up the most advanced rules about data sharing, notably requiring that data to be shared much sooner [than previously considered] and encouraging the use of open-access publications. Things are moving in the right direction, and I encourage you to talk with team members from the Office of Data Sharing to emulate their work.   

You mentioned the DSS – Data Standards Services. Is that people-based or automated software? And how do you choose from available standards, or do you create your own models?

The DSS is a group of people representing diverse expertise in different data types that provide manual curation and harmonization of CRDC data. This work is burdensome, so through the ARPA-H’s Biomedical Data Fabric Toolbox collaboration, one of the technical areas of focus is to accelerate this labor-intensive process through machine assisted curation and harmonization to support mapping of different standards implemented across research consortia and institutions. In the meantime, the CRDC makes its metadata standards accessible through the NCI Cancer Data Standards Registry and Repository (caDSR), including its Common Data Elements, Case Report Forms, and Data Collection Templates.  

What does the next ten years of success look like?

CRDC is committed to lowering barriers so that users with all levels of expertise can access and work with NCI-funded research data. The CRDC team will continue to provide resources and tutorials to support that effort. The CRDC also anticipates incorporating plain language search to make it easier for users at all levels to access the data, including citizen scientists. It also anticipates facilitating the ability to do federated research to ultimately optimize the use of cancer data.