The CRDC Fall Symposium: Report
The Cancer Research Data Commons (CRDC) 2024 Fall Symposium marked its 10th anniversary as an essential part of the NCI research data ecosystem. More than 500 registrants heard updates on the CRDC's accomplishments and previews of new initiatives.
Recordings and slides from all sessions are now available.
Keynote speaker Amanda Borens, a senior data science executive and respected figure in the research community, shared her insights based on her experience as a scientist and cancer patient.
As a survivor of recurrent, metastatic colon and breast cancer, I know that while we have not cured cancer, we are getting better at treating it. I am grateful for the research that has kept me alive into my eighth year since my initial diagnosis and has given me remission from colon cancer for the last four years. But as a patient who has undergone treatments for multiple recurrences and metastases, I know the sacrifice that is required to join studies during a painful experience. I want us to honor patients who donate pieces of themselves to advance human knowledge; most know that they will not benefit themselves. Collaboration across institutions and data sharing for secondary uses are essential in science, so I encourage everyone to openly share the data that patients contribute to the body of all human knowledge and to help their sacrifices translate into easier, more effective treatments for all patients in the future.
Dr. Warren Kibbe, NCI’s Deputy Director for Data Science and Strategy, was the Symposium’s featured speaker. As a current advisor to the NCI Director and an integral part of establishing the ecosystem as the former Director of NCI CBIIT, Dr. Kibbe provides strategic counsel to the CRDC. He focused his talk on the importance of CRDC’s leadership role in making data accessible to researchers at all skill levels by making data FAIR – Findable, Accessible, Interoperable, and Reusable.
As he noted:
Making sure data are FAIR and AI-ready will mean encouraging and training researchers to use consistent standards at the outset of their work. This requires that data scientists and cancer researchers speak the same language and reflect the country's diversity. We want as many voices as possible at the table to understand the impact all the data and tools have on patients.
Dr. Kibbe also stressed the importance of the user experience and attribution:
You have to pay attention to the way people look at your resource and what they expect to see. I think we have done a great job of making something useful and understandable with CRDC, and we need to continue to take the time to ask questions and engage the users. It is also incredibly important to give attribution and credit to the people who are both the data producers and the people who work inside our ecosystems – like developers and analysts. We need to recognize that people do important work when they reuse the data that we’ve made accessible.
A collaborative session with the NCI’s Office of Data Sharing (ODS) Symposium kicked off the CRDC Symposium, showcasing the CRDC as an exemplar data commons for researchers to consider throughout the NCI data lifecycle, as early as when developing their Data Management and Sharing plans in line with NIH guidance. Several members of the CRDC team reviewed how CRDC supports the research community through the various phases of the NCI data lifecycle, including:
- Research Funding Application, Data Generation/Collection
- Data Submission
- Data Access
- Data Use (Analyses and Tools)
- Data Retention and Sunset
A concluding panel addressed questions about how the CRDC meets or exceeds all requirements for NIH recommendations on evaluating and selecting appropriate repositories.
Jill Barnholtz-Sloan, Acting Director of the Center for Bioinformatics & Information Technology (CBIIT), kicked off the second day of the CRDC Symposium. She welcomed attendees and gave an overview of the NCI cancer data ecosystem and the role of the CRDC in that landscape.
The day’s agenda highlighted success stories using CRDC resources and outlined exciting new initiatives. Presentations included:
- Unlocking Research with CRDC Data and NCI Cloud Resources
- Bridging the Gap: Bioinformatics Pipelines for Non-Bioinformaticians
- AI and De-identification for Medical Imaging
- Investigating CRDC AI Data Readiness & the AI Data Readiness Challenge
- NCI’s Genomic Data Enclave: Lowering the Access Barrier with Chatbot
Representatives from NCI programs collaborating with the CRDC highlighted how CRDC data, resources, and infrastructure enhance their work and accelerate cancer research. Programs included:
- CPTAC: Accelerating Cancer Research Through Multiomic Integration and Data Sharing
- The cBioPortal for Cancer Genomics
- Harnessing the Integrated Canine Data Commons and the PRECINCT Canine Immunotherapy Trials Network
Alastair Thomson, Acting Director of Data Innovation at the Advanced Research Projects Agency for Health (ARPA-H), gave an update on the Biomedical Data Fabric (BDF) Toolbox Program, highlighting the collaboration between ARPA-H and the NCI using CRDC data. Twenty grants were recently made to researchers across academia and industry to develop new methods to make data easier to collect, submit, and re-use regardless of researchers’ informatics skill level.
Finally, a panel of partners across NCI, moderated by NCI’s Deputy Director for Scientific Strategy and Development, Dr. Dinah Singer, offered thoughts on the past decade of progress as well as their future plans hosting NCI-sponsored data commons and repositories. The panel concluded by offering a range of reflections on the CRDC's role in the next decade, given the rapidly evolving cancer research landscape. Among their suggestions:
- Creating additional user-friendly tools and interfaces to facilitate data access and analysis
- Embracing large-scale initiatives – similar to the TCGA – to capture and integrate categories of data at scale, from molecular to clinical to population
- Encouraging a shift from hypothesis-driven inquiry to data-driven and AI-enhanced investigations
- Developing federated research models to address the complexities of working with sensitive electronic health records
In a concluding wrap-up, attendees were reminded of the inspirational words of keynote speaker Amanda Borens, who shared photos of her family and friends.
I have had so many beautiful milestones because of researchers like you. I have often thought if we only had the discoveries of Nobel Prize winners and took all other discoveries out of the equation, we would have drops in a bucket. I know from experience that it's hard to keep going and do the work when it may not make you rich, and it may not win you a Nobel Prize. But people – patients like me – are the reason that you do it. It matters, and it's important. So thank you.
Useful links:
- Find links to recordings and presentations
- Learn more about the AI Data Readiness Challenge
- Learn more about the MIDI Challenge and imaging de-identification
- Learn more about the ARPA-H/NCI partnership to build the BDF Toolbox