Connecting Data to Accelerate Cancer Research

The NCI Cancer Research Data Commons (CRDC) is a cloud-based data science infrastructure that provides secure access to a large, comprehensive, and expanding collection of cancer research data. Users can explore and use analytical and visualization tools for data analysis in the cloud.

111,431 Participants 67 Anatomical Sites 295 Studies 1,113,198 Files 2.3PB Data
295 Studies 111,431 Participants 1,113,198 Files 2.3PB Data 67 Sites 295 Studies 111,431 Participants 1,113,198 Files 2.3PB Data 67 Anatomical Sites
Latest Blog
Three Pillars of Cloud Computing—People, Processes, and Technology

In this latest Data Science Seminar, Jim Lacey, Ph.D., M.P.H., shares the lessons he learned in transitioning a large cancer epidemiology cohort study to the cloud, including the importance of focusing on people and processes as well as technology. Project managers, principal investigators, co-investigators, data managers, data analysts—really anyone who is part of a team that wants to use the cloud or cloud-based resources for their studies—should attend.

PROJECT SPOTLIGHT
Imaging Data Commons

NCI’s Imaging Data Commons (IDC) provides cloud-based access to a wide variety of medical imaging and metadata from The Cancer Imaging Archive and other NCI projects. It’s connection to a wide variety of analytical tools allows researchers and data scientists to train and explore imaging models without downloading data.

Explore

Repositories

Store and share NCI-funded data that are not hosted elsewhere to further advance scientific discovery across a broad range of research areas.

Store and share data from NCI Clinical Trials. The resource is expected to launch in 2020.

Share, analyze, and visualize harmonized genomic data, including TCGA, TARGET, and CPTAC.

Share, analyze, and visualize multi-modal imaging data from both clinical and basic cancer research studies.

Share data from canine clinical trials, including the PRE-medical Cancer Immunotherapy Network Canine Trials (PRECINCT) and the Comparative Oncology Program.

Share, analyze, and visualize proteomic data, such as CPTAC and The International Cancer Proteogenome Consortium (ICPC).

Infrastructure

Enables users to query and connect data distributed across the CRDC for integrative analysis.

Provides semantic services and tools that facilitate interoperability of data across CRDC.

Provides secure user authentication and authorization and permanent digital object identifiers for data objects.

Cloud Resources

Access NCI-funded datasets TARGET and TCGA along with a rich collection of other datasets and collaborative projects that are part of the biomedical ecosystem. Run analysis tools at scale and collaborate securely on a scalable cloud environment.

Access data sets using fully interactive web-based applications, including BigQuery, which is hosted on Google Cloud Platform.

Explore and analyze large datasets alongside secure and scalable analytical resources for large-scale computational research.

ABOUT CRDC

CRDC is built for researchers

  • Enable the cancer research community to share diverse data types
  • Provide secure access to data
  • Facilitate the generation of innovative tools
  • Adhere to FAIR principles of data stewardship: Findable, Accessible, Interoperable, and Reusable