CRDC Insights

Updates from the Cancer Research Data Commons:
Empowering the Scientific Community to Make New Discoveries

May 23, 2024

CRDC Resources in the Classroom

Classroom with one student raising her hand

Author: Rowan Beck, PhD, Community Engagement Manager, Seven Bridges  

Min Zhang, MD, PhD, a professor at Purdue University, is passionate about developing new statistical approaches for high-dimensional data involved in biological research. She often develops lesson plans and workshops for students centered around statistical methods and bioinformatic analysis with a focus on cancer data. Dr. Zhang is a professor in the Department of Statistics at Purdue University and Associate Director of Data Science at the Purdue University Institute for Cancer Research.

During her workshops, students are provided computing resources, including access to datasets and cloud-based computing space. The focus is on learning how to analyze different types of “omics” data. In offering these workshops, Dr. Zhang has received feedback that many students don’t always have access to high-performance computing clusters, analytical tools, or even datasets, but want to continue working with large ‘omics’ data after Dr. Zhang’s workshops. 

Among her students, some are focusing on transcriptional analyses such as bulk or single-cell RNAseq, while others are interested in methods to study chromatin accessibility. 

To address students’ interest in having access to research resources, Dr. Zhang partnered with colleagues at Purdue University along with colleagues with Seven Bridges, an NCI-funded cloud resource, to develop more extensive workshops that meet these students’ needs.   

This collaboration between the Seven Bridges and Purdue University teams resulted in a multi-part instructional series for students, post-doctoral researchers, clinicians, and anyone interested in learning how to analyze cancer data. This four-part workshop series, led by Dr. Zhang, is part of the NCI-funded R25 (1R25CA233429-01A1) initiative, “Big Data Training for Cancer Research.” This NCI initiative empowers researchers to analyze cancer research data by providing them with training and cloud computing resources.  

During the four-part workshop, members of the Seven Bridges team introduce the NCI-funded Cancer Genomics Cloud, powered by Velsera (SB-CGC) and provide hands-on lessons in bulk- and single-cell RNA-seq analysis using datasets shared by Dr. Zhang’s Purdue colleagues. Each session includes a lecture on an RNA-seq topic and training on implementing that analysis on the CGC. Using data in the Cancer Research Data Commons (CRDC), the series provides real-world examples for students to follow.  

All the series slides and videos are available online so they can be used as a resource for other institutions or universities that are interested in learning more about the CGC platform. As Dr. Zhang notes, these can serve as great training resources for the next generation of cancer researchers.

In addition to Dr. Min Zhang, the Purdue University team included:

  • Timothy Ratliff, PhD, Distinguished Professor, Comparative Pathology
  • Ourania Andrisani, PhD, Distinguished Professor, Basic Medical Sciences
  • Nadia Lanman, PhD, Research Assistant Professor, Comparative Pathology
  • Dabao Zhang, PhD, Professor, Statistics
  • Doug Crabill, Senior Academic IT Specialist, Statistics 

As noted on the Seven Bridges website about its educational resources, several universities in addition to Purdue are using the CGC in their teaching and training programs. They note that the programs are useful because they offer:  

  • Simple visual representation of the steps in the workflow
  • Easy ways to grasp basic concepts, such as command-line use and running pipelines
  • Easy access for instructors to monitor in-class work, homework, and use of resources with one billing group (per class or university)
  • Collaborative environments with Seven Bridges, which provides cloud support to the class
  • Access to Seven Bridges online Office Hours, which students are encouraged to attend

Find the videos and presentation materials from this workshop on the Cancer Genomics Cloud website, where you also can find information about all the CRDC datasets made available through this NCI-funded cloud resource.   

The main CGC website:
The four-part tutorial series:
Datasets accessible through the CGC: