The Year Ahead with CRDC's Tanja Davidsen
Across the cancer research community, we all recognize the complexities of working with data that are vast, diverse, and continuously evolving. Contemporary research requires data at massive scale along with consistency in quality and interoperability standards to support sophisticated inquiries that result in improved diagnostics, disease management, and treatment for patients and populations.
Tanja Davidsen, Chief, Data Ecosystems Branch, NCI Center for Biomedical Informatics and Information Technology. Davidsen’s portfolio includes the Cancer Research Data Commons (CRDC).
As we enter the new year, Tanja Davidsen offers a preview of the CRDC’s 2024 goals that scaffold up to the overarching NCI data science and research goals.
- Aggregating data submission and data exploration across the CRDC
- NCI-funded cancer researchers will soon be able to share their data with the research community through one centralized process. Early in 2024, the CRDC will streamline the process for making data submission requests. Later in the year, an aggregated CRDC portal will streamline the data submission process. This will make things easier for users, particularly for teams submitting multiple data types to the CRDC. A governance board will ensure the quality and relevance of data accepted by the CRDC, as well as the requirements regarding specific data types. A concierge team will walk users through request and submission processes.
- Spring of 2024 will see a more public launch of the Cancer Data Aggregator (CDA), an application used to build cohorts from across multiple data commons, notably the Genomic Data Commons (GDC), Proteomic Data Commons (PDC), Cancer Data Service (CDS), and the Imaging Data Commons (IDC). The longer term goal of the CDA is to search and pull data from additional repositories across NIH.
- Addressing financial and training barriers to cloud-based data analysis
- The CRDC Cloud Resources teams offer self-paced tutorials, webinars, and online office hours for users at all career levels.
- They also offer incentives including cloud credits for data storage or analysis to get researchers started. It is generally more cost effective to run analysis on the cloud with richer tooling than the egress cost of downloading data. This is especially pronounced for large datasets.
- The year ahead will feature case studies demonstrating the benefits of using NCI-funded Cloud Resources and analytical tools in working with CRDC-housed data.
- Advancing ongoing work to implement widely used standards
- Making data FAIR – Findable, Accessible, Interoperable, and Re-usable – requires consistent data standards as well as reliable access and security protocols. Ideally, these are harmonious with the larger national and international research communities’ approaches. To that end, CRDC team members continue their active engagement with organizations such as GA4GH, among others, to secure broad agreement on data standards and protocols across the cancer research community.
- Launching new data commons
- The new Clinical and Translational Data Commons (CTDC) will include diverse datasets beginning with data from the Cancer Moonshot Biobank.
- Additionally, the CRDC is working towards a Population Sciences Data Commons (PopSciDC), with data from research on environmental/geographical and population-defined variations in treatment and outcomes.
- Ensuring NCI’s data sustainability
- CRDC is exploring sustainability requirements, including storage and access, as research data grows exponentially in volume, variety, and complexity.
- The sustainability assessment and implementation planning process is underway with a focus on a federated data governance framework to ensure long-term access to NCI-funded research data and resources by the cancer research community.
- Meeting the needs of AI-based research
- CRDC is conducting an AI readiness initiative to ensure that its data, Cloud Resources, and analytical tools meet the needs of AI-based research.
Building a unified system to collect, integrate, and share data from a broad range of research types and settings is essential to speeding progress, cutting cancer deaths in half, and learning from every cancer patient – the goals of the Beau Biden Cancer Moonshot and the National Cancer Plan. We know that 2024 will see great progress towards these goals, and we anticipate reporting on that progress at our Fall CRDC Symposium, scheduled for October 17, 2024 and open to the public. Hold the date: more details to come.
– Tanja Davidsen