Proteomic Data Commons(PDC)
The Proteomic Data Commons (PDC) was developed to advance understanding of how proteins help to shape the risk, diagnosis, development, progression, and treatment of cancer. In-depth analysis of proteomic data allows the study of both how and why cancer develops and informs ways of tailoring treatment for individual patients using precision medicine.
The data in the PDC are structured and queryable using the PDC data model and data dictionary. Submitted data are processed and then harmonized to maintain data and metadata consistency, integrity, and availability to the PDC users and the rest of the CRDC.
All proteomic data in the PDC are open access and, with appropriate attribution, can be included in publications. Through the PDC, researchers will have access to highly curated and standardized biospecimen, clinical, and proteomic data, as well as an intuitive interface to filter, query, search, visualize, and download all data and metadata. In addition to the PDC’s graphical user interface, there is also an Application Programming Interface (API) that can be used to query the data programmatically.
In addition to providing data, the PDC also offers analysis tools:
Jbrowse – map peptides to the human genome
Pepquery – identify and validate novel peptides
Morpheus – quantitative data visualization and analysis
The data files of the PDC are available for analysis in the CRDC Cloud Resources. Specifically, PDC users can build cohorts of interest in the PDC portal and then analyze the associated data files in Seven Bridges Genomics.
The PDC contains multiple data types, including the raw data and the harmonized data through the common data analysis pipeline. A full list of PDC data types are here:
|Data Category||File Type||File Format|
|Raw Mass Spectra||Proprietary||Vendor-specific|
|Processed Mass Spectra||Open Standard||mxML|
|Peptide Spectral Matches||Text, Open Standard||TSV, mzIdentML|
|Other Metadata||Document||PDF, DOC, XLSX, TSV|
- NCI’s Clinical Proteomic Tumor Analysis Consortium (CPTAC)
- Children’s Brain Tumor Tissue Consortium (CBTTC)
- International Cancer Proteogenomic Consortium (ICPC)
As the PDC matures, additional cancer proteomic datasets will be added, including data from other large proteomic programs, such as the Applied Proteogenomics Organizational Learning and Outcomes (APOLLO).
PDC includes data on the following tumors:
- Bronchus and Lung