Repository

Cancer Data Service

(CDS)
Enabling secure and flexible storage and sharing of data
CDS

Overview

The Cancer Data Service (CDS) is a data repository under the Cancer Research Data Commons (CRDC) infrastructure for storing cancer research data generated by NCI funded programs. The CDS provides secure and authorized storage and data sharing capabilities in the cloud for studies that can fall under either of the categories below:

  • Studies with data which do not fit current data type criteria for submission, and/or do not meet minimum metadata standards for submission to a CRDC Data Commons
  • Studies with data that do not have a Data Commons set up for the data type
  • Studies which are on a waiting list on a specific Data Commons for storage such as the Genomics Data Commons (GDC). 
  • Studies which are still evolving and do not have a place to store their data and analyze it during acquisition phase. 

The CDS system is hosted on NCI's CloudOne Amazon instance. The patients, samples, and derived mutation files for these projects are stored in the Database for Genotypes and Phenotypes (dbGaP) database provided by National Center for Biotechnology Information (NCBI). 

 

Data Types

The CDS contains mostly genomic data but can accommodate multiple data types based on the accepted studies.  

 

Analysis Tools


The Seven Bridges Cancer Genomics Cloud, one of the NCI’s Cloud Resources, can be used as a primary resource for analyzing data. To help researchers analyze data in the cloud, CBIIT established three Cloud Resources that provide support for data access through a web-based user interface in addition to programmatic access to analytic tools and workflows, and the capability of sharing results with collaborators. ISB-CGC, and Broad FireCloud are the other two Cloud Resources which could be used for analysis in future. While SB-CGC is established on Amazon Web Services (AWS), ISB-CGC and Broad FireCloud are Google Cloud based. 

 

Data Access


The CDS hosts controlled and open access data. Access to controlled access data on CDS is through the NCI DAC approved, dbGaP compiled whitelists. No authorization will be required for any open access studies, although we do not currently have any open access datasets in the CDS.  

Users can access the data for analysis through the Seven Bridges Cancer Genomics Cloud (SB-CGC) which is one of the NCI-funded Cloud Resource/platform for compute intensive analysis.
 

Datasets

  • GECCO - Detection of Colorectal Cancer Susceptibility Loci Using Genome-Wide Sequencing (phs001554.v1.p1)
  • LCCC 1108 - Development of A Tumor Molecular Analyses Program and Its Use to Support Treatment Decisions (phs001713.v1.p1)
  • PPTC - Pediatric Preclinical Testing Consortium (phs001437.v1.p1)
  • Aggressive Prostate Cancer - The Genetic Basis of Aggressive Prostate Cancer, The Role of Rare Variation (phs001524.v1.p1)
  • Tavtigian ColoRectal Cancer - Discovery of colorectal cancer susceptibility genes in high-risk families (phs001787.v1.p1)
  • PLCO - Prostrate, Lung, Colorectal and Ovarian Cancer Screening Trial (phs002011.v1.p1)
  • Familial Myeloma Sequencing - Whole genome sequencing to discover familial myeloma risk genes (phs001819.v2.p1)
  • Molecular Pathological Epidemiology of Colorectal Cancer (phs002050.v1.p1)

Links are provided for the studies released and available on CGC.
Other studies listed have data deposited in CDS and will be released on the CGC soon

Anatomical Sites

CDS has data on the following tumors:

  • Colorectal
  • Lung
  • Prostate
Learn more about CDS and how to access data