Repository

Proteomic Data Commons

(PDC)
Facilitating proteogenomics to revolutionize precision medicine
PDC

Overview

The Proteomic Data Commons (PDC) was developed to advance understanding of how proteins help to shape the risk, diagnosis, development, progression, and treatment of cancer. In-depth analysis of proteomic data allows the study of both how and why cancer develops and informs ways of tailoring treatment for individual patients using precision medicine.

The data in the PDC are structured and queryable using the PDC data model and data dictionary. Submitted data are processed and then harmonized to maintain data and metadata consistency, integrity, and availability to the PDC users and the rest of the CRDC.

All proteomic data in the PDC are open access and, with appropriate attribution, can be included in publications. Through the PDC, researchers will have access to highly curated and standardized biospecimen, clinical, and proteomic data, as well as an intuitive interface to filter, query, search, visualize, and download all data and metadata. In addition to the PDC’s graphical user interface, there is also an Application Programming Interface (API) that can be used to query the data programmatically.

In addition to providing data, the PDC also offers analysis tools:

  • Jbrowse – map peptides to the human genome
  • Pepquery – identify and validate novel peptides
  • Morpheus – quantitative data visualization and analysis

The data files of the PDC are available for analysis in the CRDC Cloud Resources. Specifically, PDC users can build cohorts of interest in the PDC portal and then analyze the associated data files in Seven Bridges Cancer Genomics Cloud, powered by Velsera and Broad Institute FireCloud.  

Data Types

The PDC contains multiple data types, including the raw data and the harmonized data through the common data analysis pipeline. A full list of PDC data types are here:

https://pdc.cancer.gov/pdc/faq/Files_Download

Data CategoryFile TypeFile Format
Raw Mass SpectraProprietaryVendor-specific
Processed Mass SpectraOpen StandardmxML
Peptide Spectral MatchesText, Open StandardTSV, mzIdentML
Protein AssemblyTextTSV
Quality MetricsWebHTML
Other MetadataDocumentPDF, DOC, XLSX, TSV

Anatomical Sites

PDC includes data on the following tumors:

  • Bone Marrow
  • Brain
  • Breast
  • Bronchus and Lung
  • Colon
  • Head and Neck
  • Kidney
  • Liver
  • Ovary
  • Pancreas
  • Rectum
  • Stomach
  • Uterus
Image
Two Humans With Anatomical Features
EXPLORE PROTEOMIC DATA COMMONS