Proteomic Data Commons
(PDC)Overview
The Proteomic Data Commons (PDC) was developed to advance understanding of how proteins help to shape the risk, diagnosis, development, progression, and treatment of cancer. In-depth analysis of proteomic data allows the study of both how and why cancer develops and informs ways of tailoring treatment for individual patients using precision medicine.
The data in the PDC are structured and queryable using the PDC data model and data dictionary. Submitted data are processed and then harmonized to maintain data and metadata consistency, integrity, and availability to the PDC users and the rest of the CRDC.
All proteomic data in the PDC are open access and, with appropriate attribution, can be included in publications. Through the PDC, researchers will have access to highly curated and standardized biospecimen, clinical, and proteomic data, as well as an intuitive interface to filter, query, search, visualize, and download all data and metadata. In addition to the PDC’s graphical user interface, there is also an Application Programming Interface (API) that can be used to query the data programmatically.
In addition to providing data, the PDC also offers analysis tools:
- Jbrowse – map peptides to the human genome
- Pepquery – identify and validate novel peptides
- Morpheus – quantitative data visualization and analysis
The data files of the PDC are available for analysis in the CRDC Cloud Resources. Specifically, PDC users can build cohorts of interest in the PDC portal and then analyze the associated data files in Seven Bridges Cancer Genomics Cloud, powered by Velsera and Broad Institute FireCloud.
Data Types
The PDC contains multiple data types, including the raw data and the harmonized data through the common data analysis pipeline. A full list of PDC data types are here:
https://pdc.cancer.gov/pdc/faq/Files_Download
Data Category | File Type | File Format |
---|---|---|
Raw Mass Spectra | Proprietary | Vendor-specific |
Processed Mass Spectra | Open Standard | mxML |
Peptide Spectral Matches | Text, Open Standard | TSV, mzIdentML |
Protein Assembly | Text | TSV |
Quality Metrics | Web | HTML |
Other Metadata | Document | PDF, DOC, XLSX, TSV |
Anatomical Sites
PDC includes data on the following tumors:
- Bone Marrow
- Brain
- Breast
- Bronchus and Lung
- Colon
- Head and Neck
- Kidney
- Liver
- Ovary
- Pancreas
- Rectum
- Stomach
- Uterus