Repository

Imaging Data Commons

(IDC)
Enabling access, visualization, and analysis in multi-modal imaging data science
IDC

Overview

IDC connects researchers with publicly available cancer imaging data, often linked with other types of cancer data, and co-located with cloud-based computational resources and big data analysis tools provided by the Google Cloud Platform. IDC will provide the tools to search and visualize cancer imaging data, define cohorts and use those cohorts for cloud-based analysis to better understand the disease and evaluate treatment options.

All data hosted by IDC will be available publicly. The current content of IDC will be populated using the radiology collections from The Cancer Imaging Archive (TCIA). In the subsequent stages IDC will be expanded to offer digital pathology images, and multispectral data from the Human Tumor Atlas Network (HTAN). IDC will accept data de-identified by TCIA or other Data Coordinating Centers.

IDC will provide access to the data standardized using the Digital Imaging and Communication in Medicine (DICOM) standard. IDC will work with projects generating the data to harmonize alternative formats into DICOM representation. Its content will include not only images, but also image annotations and analysis results, and will be linked using common identifiers to the other types of cancer data, such as proteomics and genomics datasets. Access to the data will be supported using standard interfaces. Analysis tools suggested for use on cloud systems will eventually be containerized and published in central repositories similar to other data coordinating centers.  Given the IDC role as an imaging data coordinating center, there will be a major focus on establishing best practices for imaging research. In this regard, one of the goals of IDC is in preparing and adapting commonly used tools for image analysis to be run on cloud environments with the IDC hosted datasets. Summarized derived data from analyses previously run will be associated with imaging data on IDC for ease of use by the research community.

Data Types

IDC will contain various types of images and image-derived data harmonized using the DICOM standard (the list below corresponds to the future outlook into later phases of the project): 

  • Clinical and preclinical imaging
  • Radiological images (e.g., CT, MRI, PET)
  • Digital pathology images (in 2021)
  • Multispectral microscopy images (in 2021)
  • Image annotations (e.g., planar and volumetric, regions of interest)
  • Parametric maps derived from images (e.g., perfusion and diffusion maps)
  • Measurements derived from the images (e.g., radiomics features for the annotated regions of interest)
  • Expert assessments of the image findings (e.g., qualitative characterizations of lesion appearance)

Datasets

IDC Minimum Viable Product will include radiology data from the following projects: 

Anatomical Sites

Data from the following organ sites, covered by the TCIA, will also populate the content of the IDC:

  • Bladder
  • Bone Marrow
  • Brain
  • Breast
  • Colon
  • Head and Neck
  • Kidney
  • Liver
  • Lung
  • Pancreas
  • Prostate
  • Rectum
  • Skin
  • Uterus
Image
Two Humans With Anatomical Features
Explore the Imaging Data Commons