SB-CGC Tutorial Featured in New Bioinformatics Textbook
The Cancer Research Data Commons (CRDC) is committed to supporting the research community by making data from many NCI-funded programs accessible through CRDC’s various data commons, as well as through the NCI Cloud Resource, the Seven Bridges Cancer Genome Cloud (SB-CGC). Researchers can work within the SB-CGC's secure compute environment and use hundreds of analytical tools to explore CRDC datasets of interest. SB-CGC further contributes to CRDC’s mission by providing educational tutorials in support of researchers of different skill levels.
Recently, the team from SB-CGC developed an RNA sequencing tutorial, which is featured in Cancer Bioinformatics, a new textbook published by Springer Nature. Led by SB-CGC’s Rowan F. Beck, PhD, the team’s tutorial guides users through the essentials of RNA sequencing analysis with tools available on their platform. Beck emphasizes SB-CGC’s commitment to helping students and researchers make the most of research data and computational resources.
As Beck notes, “If you are conducting cancer research, you are probably working with large amounts of data and need to understand – at least conceptually – the computational approaches required to perform increasingly complex inquiries. Our role is to provide the infrastructure, expertise, and computational support that allow researchers to focus on discovery rather than logistics.”
The chapter, titled “Building Portable and Reproducible Cancer Informatics Workflows for Scalable Data Analysis: An RNA Sequencing Tutorial,” assumes some familiarity with RNA sequencing and differential gene expression. RNA sequencing (RNA-seq) measures transcripts in a sample to estimate gene expression levels. It can detect splicing and other gene features, helping identify which genes are active or inactive. Differential analysis is a statistical method used to compare gene expression changes across different cohorts.
The tutorial walks users through setting up an analysis workspace, copying data into a project folder, and running an existing workflow available on the SB-CGC platform. The project begins with raw FASTQ reads, a standard file format for storing high-throughput sequencing data, and guides users through the generation of summary reports and visualizations. Prompts are provided throughout, and all steps and results can be saved for further review.
Alex Krasnitz, PhD, of Cold Spring Harbor Laboratory and the textbook's editor, notes that bioinformatics expertise is increasingly vital for cancer research due to the rapid growth of available data, declining computing costs, increased computational complexity, and the requirement to demonstrate reproducibility.
My colleagues and I invited Rowan Beck and her team to contribute to this new text, given their ongoing work helping the biomedical research community analyze large-scale sequencing data,” said Krasnitz.
The SB-CGC, an NCI-funded cloud resource, provides a secure, scalable computational environment that integrates with NCI’s interoperability standards and connects directly to the CRDC. Researchers can select CRDC data from the data commons’ portals and transfer it to the SB-CGC for analysis. Users can also access the most popular CRDC-hosted datasets through the SB-CGC platform.
In addition to making CRDC data accessible, the SB-CGC offers hundreds of publicly available analytical workflows and tools that can be used as-is or customized using various workflow languages. SB-CGC integrates with widely used analytical tools such as Python, R Studio, S, and Galaxy. Access to controlled datasets requires registration and approval through NIH’s Database of Genotypes and Phenotypes (dbGaP), which centralizes requests for sensitive NIH data.
“By lowering the barriers to accessing and analyzing large-scale cancer datasets, we help scientists move from data to insight faster and more confidently. Together, we’re building a connected research community that turns individual projects into collective progress,” says Beck.
The SB-CGC website offers several recorded webinars and tutorials, and the team has co-taught courses with Georgetown University, Purdue University, and the University of California, Irvine. One of these collaborations includes an introductory presentation on bioinformatics for RNA sequencing. In addition, SB-CGC offers regular online office hours for researchers at all experience levels.