Explore Data

Overview

Many large NCI-funded research projects share their rich multi-modal data with the public via the Cancer Research Data Commons (CRDC). There are a number of ways users can explore CRDC data including the CRDC’s Data Commons’ portals, the Cancer Data Aggregator API and notebooks, as well as the CRDC Cloud Resources.

Among the many NCI-funded projects that share their data via the CRDC are:

  • APOLLO: Applied Proteogenomics Organizational Learning and Outcomes Network 
  • CCDI: Childhood Cancer Data Initiative 
  • CPTAC: Clinical Proteomic Tumor Analysis Consortium 
  • HTAN: Human Tumor Atlas Network
  • TARGET: Therapeutically Applicable Research to Generate Effective Treatments 
  • TCGA: The Cancer Genome Atlas

Select Datasets Learn more about CRDC hosted datasets.

Exploring Data using Data Commons

Each data commons provides a search interface to explore data by demographics, site of disease, or the name of a specific study, among other variables. Users can explore data from across multiple programs and initiatives, and can build cross-cutting “virtual” cohorts for aggregated analysis. For further analysis in a cloud-based compute environment, users can build a data manifest to pull cohort data into one of the NCI-funded Cloud Resources.  

In addition to this general exploration, many data commons provide data visualization and other analytical tools within the data portal environment. 

Data Commons Learn more about each Data Commons. 

Aggregated Exploration Across Data Commons

The Cancer Data Aggregator (CDA) combines descriptive information about CRDC-housed data into a common model making it possible to search across multiple data commons using variables such as participant, sample, tissue, or disease.  

While anyone can browse the CDA’s indexed metadata, researchers wanting to work with controlled-access data still need to apply for appropriate access to work with actual data (vs metadata) files.

Cancer Data Aggregator Learn more about the Cancer Data Aggregator.

Data Exploration through the CRDC Cloud Resources

The CRDC Cloud Resources (CRs) also serve as entry points for exploring CRDC data. Three NCI-funded CRs, each with distinct features, provide secure workspaces and the ability to use or tailor publicly available analytical tools and workflows from their platforms. 

One of the key benefits of using the CR is that users can access the data without downloading large amounts of data to a local compute environment, which can involve high download costs.   

Cloud Resources Learn more about the Cloud Resources. 

CRDC Core Standards and Services

Ensuring that CRDC-housed data meet the FAIR standards – Findable, Accessible, Interoperable and Reusable - data must be organized, stored, and searchable based on common standards and terms. A suite of core data standards and services related to data tracking and secure access provide essential support to the CRDC data ecosystem. 

CRDC Standards and Services Learn more about CRDC standards and services.