Seven Bridges Cancer Genomics Cloud (SB-CGC) Detailed Update

EXPLORE

The Seven Bridges Cancer Genomics Cloud (SB-CGC) platform, powered by Velsera, provides access to most CRDC datasets. The platform has updated and added features to make data exploration easier. 

  • Streamlined process to retrieve controlled-access study permissions 
    • A new API returns a list of approved controlled-access datasets to a user’s account if they are authorized to access them. This helps researchers automate downstream workflows and troubleshoot data access issues. 
    • To use this feature a researcher’s RAS (Researcher Authentication Service, which uses their NIH eRA Commons credentials) account must be linked to their SB-CGC account, and they must have approved Data Access Requests (DARs) in dbGaP for the studies they want to access. 
  • File Browser improvements 
    • These ensure a smoother, more consistent experience when working with project files identified with Data Repository Service (DRS) Uniform Resource Identifiers (URI’s). These are unique addresses or labels that point to specific data files, regardless of location.   
    • The bulk download selection limit has been increased to 1,000 items, for more efficient bulk downloads from SB-CGC to a local compute environment.   
    • The Generate Data Repository Service (DRS) Manifest option has been restored. This produces a CSV or TSV (comma- or tab-separated spreadsheet) file that lists locations of selected files for easy retrieval.
    • Navigation has been improved within SB-CGC to better highlight CRDC data. This includes SB-CGC's Faceted Search (a filtering tool that lets users search and select files by categories such as data type, disease, or project) and the new SB-CGC File Browser. Additionally, importing a DRS manifest (as noted above) also returns results to the new File Browser. 

ANALYZE

As noted in the abbreviated update about the Seven Bridges Cancer Genomics Cloud (SB-CGC), several new bioinformatics tools and features have been added to its public apps gallery. These enhance researchers' ability to conduct reproducible, scalable, and accessible analyses of large-scale genomic, multi-omics, and imaging data. Find more detail below. 

  • DeepVariant & Giraffe-DeepVariant – Major version updates improve small-variant calling across multiple sequencing technologies. Small variant calling is the process of identifying single-letter differences and small insertions or deletions in DNA sequences compared to a reference genome.
     
  • Genotype GVCFs & Filter Variants workflow – Upgraded to GATK 4.6.2.0. GATK (Genome Analysis Toolkit) is a widely used suite of tools for variant discovery, identifying differences in DNA sequence across samples. GVCFs (Genomic Variant Call Format files) are intermediate files that store variant information for individual samples before being combined and filtered across a cohort.
     
  • sbmanifest – A lightweight Common Workflow Language (CWL), which generates and validates sample sheets (structured files that list samples and their associated data files) that are required for Nextflow workflows. Nextflow is a popular framework for running scalable, reproducible scientific pipelines.
     
  • sbpack_nf – A CWL tool that creates a CWL wrapper for Nextflow workflows, producing an execution-ready, platform-compatible Nextflow app in a user-defined SB-CGC project. This allows researchers to package and run their Nextflow pipelines directly on the SB-CGC platform without manual configuration.
     
  • DRS Bulk Import & File Upload improvements
    • This improves DRS bulk import  with more readable metadata, including tooltips, (on-screen text that appears when hovering over an item to provide additional context).
    • Easier file uploading with new drag-and-drop file uploads and updated confirmation text styling to clearly indicate upload state and next steps.
    • Automatic session recovery for long-running directory and volume import jobs, so imports can complete successfully even if the user’s connection times out.
       
  • Enhanced Data Studio Session Stability
    • This improves shutdown and backup reliability for Data Studio (the SB-CGC’s interactive analysis environment), ensuring large and complex sessions complete backups and clean-up before shutdown.
       
  • Integrated Visualization & Annotation Tools
    • Three imaging graphical user interface tools (GUI) are available in the SB-CGC’s Data Studio. There are point-and-click applications, as opposed to command-line tools.
      • OHIF Viewer, an open-source web-based viewer for medical images such as radiology scans
      • 3D Slicer, a platform for visualizing and analyzing 3D medical image data
      • ImageJ, a widely used image processing tool in biomedical research
    • These cloud-hosted applications do not require local installation and support custom plugins and GPU-enabled compute (graphics processing units that accelerate computation-heavy tasks like image analysis). They also support annotated outputs, and results can be saved back into the user’s workspace for downstream analysis.
    • This update significantly expands the SB-CGC’s Data Studio beyond code-based tools like JupyterLab and RStudio, making it easier to work with complex imaging data. 
       
  • New Machine-Learning Tools for Multi-omics Integration and Imaging Analysis
    • MONAI is an open-source framework for deep learning. It uses layered neural networks to identify patterns in complex data and supports 3D biomedical image segmentation (the process of identifying and delineating structures of interest within a 3D image, such as a tumor in a scan).
      • MONAI Auto3DSeg – Automatically trains, tunes hyperparameters (settings that control how a model learns), and evaluates multiple 3D segmentation models end-to-end with minimal user intervention.
      • MONAI nnUNetV2 – A fully automated, self-configuring segmentation framework that adapts to any new dataset without human tuning.
      • MOFA2 is an unsupervised multi-omics integration framework that can search and identify patterns across multiple omics datasets (genomics, proteomics, DNA methylation) to capture shared or view-specific sources of variation. 
      • Data Harmonizer – Prepares and aligns omics data files for MOFA2 input, supporting tabular, .h5ad (a format for single-cell data), and .h5mu (a format for multi-modal omics data) file formats.
      • MOFAx-0-3-7 – Generates visualization plots from trained MOFA2 models to help researchers interpret latent factors and explore relationships among samples, features, and data types across datasets.

Learn more about SB-CGC's analytical tools and resources on the SB-CGC website.

RESOURCES

  • Past webinar recordings are available on the SB-CGC website.
  • Office Hours are held weekly at the times below. All are welcome to join:
    • Tuesdays at 10:00 am EST and Thursdays at 2:00 pm EST 
    • Join here

Return to SB-CGC updates.