How to Access Data

Researchers can access CRDC-hosted cancer research datasets either through CRDC repositories or through CRDC Cloud Resources. 

Repositories / Key Datasets Description Link

Genomic Data Commons (GDC):

TCGA, TARGET, CPTAC, FMI, CCLE, APOLLO

The GDC houses both open and controlled access datasets. The list of open and closed datasets can be viewed through the GDC portal.

Access to controlled GDC datasets requires an NIH review process.

GDC portal

 

Proteomic Data Commons (PDC):

CPTAC, ICPC, APOLLO

All PDC data are accessible to the public as open access datasets.

PDC portal

 

Imaging Data Commons (IDC):

TCGA, CPTAC, and HTAN

The IDC pulls data from these datasets, through the Imaging Data Archive and other NCI-approved imaging repositories.

Only de-identified, publicly available data are

available through the IDC.

IDC portal

 

Integrated Canine Data Commons (ICDC):

COTC007B, GLIOMA01, NCATS-COP-01

ICDC data are accessible to the public as open

access datasets.

ICDC portal

Clinical Trials Data Commons (CTDC):

ECOG-ACRIN and NCI-MATCH

CTDC datasets are available under restricted

access.

CTDC

 

Cancer Data Service (CDS):

HTAN, CCDI, PLCO

The CDS hosts controlled and open access data.

Access to controlled access data is through

the NCI DAC approval process.

CDS

The following cloud resources make datasets accessible for analysis in secure workspaces, and facilitate uploading of researchers’ own data for aggregated/federated analysis.

Cloud Resources / Key Datasets Description Link

Seven Bridges Cancer Genomics Cloud (SB-CGC)

Includes data housed in several CRDC repositories

plus other NCI datasets or datasets from international sources or US sources not directly affiliated with the NCI.

Access to datasets follows guidance outlined with each dataset and/or repository.

SB-CGC

ISB Cancer Gateway in the Cloud (ISB-CGC)

Datasets include TARGET, TCGA, and CPTAC. ISB also hosts data from specialty databases such as TP53 and the Mitelmen databases.

Access to datasets follows guidance outlined with each dataset and/or repository. ISB-CGC

Broad Institute’s FireCloud Powered by Terra

Datasets include TARGET and TCGA. 

Access to datasets follows guidance outlined with each dataset and/or repository.

Broad Institute’s FireCloud  

 

Other Data Access Portals in Development


cBioPortal: cBioPortal is an open-access, open-source resource for exploratory and interactive visualization, analysis, and download of large-scale cancer genomics data sets. It is hosted by the Center for Molecule Oncology at Memorial Sloan Kettering Cancer Center (MSK). The CRDC-specific instance of the cBioPortal portal integrates all public genomic datasets found within the CRDC. Learn more about the cBioPortal.   

Galaxy: Galaxy is an open source, web-based platform used for computational biomedical research, which allows users without programming experience to easily specify parameters and run individual tools as well as larger workflows. This interactive analysis tool is being integrated into the CGC Data Studio by Seven Bridges.   

For the full list of NCI datasets, go to the NCI data catalog.   

More information on accessing open and controlled-access data can be found on the NCI Office of Data Sharing website