Repository

Cancer Data Service

(CDS)
Enabling secure and flexible storage and sharing of data
CDS

Overview

The Cancer Data Service (CDS) is a data repository under the Cancer Research Data Commons (CRDC) infrastructure for storing cancer research data generated by NCI funded programs. The CDS provides secure and authorized storage and data sharing capabilities in the cloud for studies that can fall under either of the categories below:

  • Studies with data that do not fit current data type criteria for submission, and/or do not meet minimum metadata standards for submission to a CRDC Data Commons
  • Studies with data that do not have a Data Commons set up for the data type
  • Studies that are on a waiting list for submission to a specific Data Commons for storage such as the Genomics Data Commons (GDC). 
  • Studies that are still evolving and do not have a place to store and analyze data during the acquisition phase.

The CDS system is hosted on CBIIT's CloudOne Amazon instance. The patients, samples, and derived mutation files for these projects are stored in the Database for Genotypes and Phenotypes (dbGaP) database provided by National Center for Biotechnology Information (NCBI).

 

Data Types

The CDS contains mostly genomic data but can accommodate multiple data types based on the accepted studies.

 

Tools


The Seven Bridges Cancer Genomics Cloud, one of the NCI’s Cloud Resources, can be used as a primary resource for analyzing data. To help researchers analyze data in the cloud, CBIIT established three Cloud Resources that provide support for data access through a web-based user interface in addition to programmatic access to analytic tools and workflows, and the capability of sharing results with collaborators. ISB-CGC, and Broad FireCloud are the other two Cloud Resources which could be used for analysis in future. While SB-CGC is established on Amazon Web Services (AWS), ISB-CGC and Broad FireCloud are Google Cloud based.

 

Data Access


The CDS will contain publicly available controlled- and open-access data. Users can access controlled-access data from dbGaP with eRA Commons authorizations. No authorization will be required for any open-access studies, although no open-access datasets are currently available in the CDS. 

In addition to making public data available, a project is being piloted for in-network data sharing. This pilot will allow projects with pre-publication datasets to store data in the CDS and control access to these data through an NCI-generated user list, which can be limited to only those users in the project’s network. 

Users of both the publicly available CDS datasets and the in-network data can access the data for analysis through the Seven Bridges Cancer Genomics Cloud (SB-CGC), NCI-funded cloud platform for compute-intensive analysis.

Datasets

  • Aggressive Prostate Cancer - The Genetic Basis of Aggressive Prostate Cancer, The Role of Rare Variation
  • GECCO - Detection of Colorectal Cancer Susceptibility Loci Using Genome-Wide Sequencing
  • Tavtigian ColoRectal Cancer - Discovery of colorectal cancer susceptibility genes in high-risk families
  • LCCC 1108 - Development of A Tumor Molecular Analyses Program and Its Use to Support Treatment Decisions
  • PPTC - PEDIATRIC PRECLINICAL TESTING CONSORTIUM

Anatomical Sites

CDS has data on the following tumors:

  • Colorectal
  • Lung
  • Prostate
Learn more about CDS and how to access data