We summarize our seven Spring training sessions for Minerva users below.

  • There are two information/training sessions for Minerva. These sessions are intended to familiarize you with the Minerva environment. Basic understanding of the general Unix operating environment and Linux commands is expected.
  • There is a training session for Data Ark to familiarize you with the Data Ark – Mount Sinai Data Commons data sets and environment.
  • We will also hold four GPU/AI training sessions for Minerva users, jointly presented by Minerva HPC staff and NVIDIA domain experts.
  • Sessions 2 and 6 will be offered in person at the Icahn Building as well as remotely through Zoom. Sessions 3, 4, 5 and 7 will be held remotely through Zoom. Zoom links are provided following registration.

 

Session 1: Introduction to Minerva – held on Wednesday, March 27, 1 pm-2 pm

Training slides and video are available.

This session covered:

  • Minerva resources
  • Account and logging in
  • User software environment
  • Service on file transfers, web server, TSM archive and Posit connect server

 

Session 2: Load Sharing Facility (LSF) Job Scheduler – Wednesday, April 3, 1 pm-2 pm

Training slides and video are available.

This session covered:

  • LSF introduction and basic/helpful LSF commands
  • Dependent job
  • Self-scheduler
  • Parallel jobs (job arrays, parallel processing and GPUs)
  • Things to avoid

 

Session 3. Introduction to GPU/AI resources on Minerva – Wednesday, April 10, 1 pm -2 pm

Training slides and video are available.

This session covered:

  • What is a GPU
  • GPU resources on Minerva
  • User GPU/AI Software environment on Minerva
  • Running GPU/AI jobs in LSF

 

Session 4. 5 Ways to Get Started with GPUs – Friday, April 12, 1 pm-2 pm

Click here to register for 5 Ways to Get Started Session on Zoom

Background
This talk will provide an introduction to GPU acceleration that outlines the 5 ways to accelerate computationally intensive code using GPUs. This session is a great starting point for those who would like to begin leveraging the benefits of accelerated computing.

Learning Objectives
By the end of the presentation the audience will understand:

  • Some GPU basics
  • 5 ways to accelerate with GPUs (Applications, Library, OpenACC Directives, CUDA Programming, Standard Language Parallelism)

 

Session 5: Accelerated General Data Science in Medicine with RAPIDS, CuPy and Numba – Wednesday, April 17, 1 pm-2 pm

Click here to register for Accelerated general data science Session on Zoom

Background This talk will discuss how to accelerate and how to tackle common bottlenecks (e.g., memory management, data loading, data format conversion across different frameworks) in general medicine data analytics pipelines by using GPU-accelerated python libraries like RAPIDS. And we will demo an end-to-end GPU-accelerated pipeline for an electrocardiogram AI application.

Learning Objectives
By the end of the presentation the audience will understand:

  • Overview of GPU Computing
  • GPU-Accelerated Numerical Computing with CuPy
  • GPU-Accelerated Data Science with RAPIDS
  • Custom GPU Kernels with Numba Frameworks
  • Interoperability – Data Conversion Bottleneck

 

Session 6: Introduction to Data Ark – Mount Sinai Data Commons – Wednesday, April 24, 1 pm-2 pm

Click here to register for Intro to Data Ark – Mount Sinai Data Commons on Zoom

Click here to register to attend in-person

In-person location: Icahn School of Medicine building (1425 Madison Ave) Room L3-36

Remote attendance: Zoom link provided following registration.

This session will cover:

  • Introduction to Data Ark
  • Accessing datasets through Data Ark
  • Introduction to MarketScan data
  • Accessing MarketScan Data via Minerva HPC

 

Session 7: How to Accelerate Genome Analysis Toolkit (GATK) by using Parabricks – Wednesday, May 1, 1 pm-2 pm

Click here to register for How to Accelerate Session on Zoom

Background
In this talk we will discuss the capabilities of Parabricks, the performance compared to traditional genomics software packages (such as GATK) and show a demo of what it looks like in action. Genomic sequencing is faster and cheaper than ever. The new bottleneck in the genomics pipeline is in the analysis. It can take upwards of 30 hours to run variant calling on a single sample, it could take months or even years to process thousands of samples. This is where Parabricks comes in. Using GPU acceleration, the variant calling time havs been cut down to below 30 minutes for a 30x human genome. This allows for new genomics projects to be done at a scale that was not previously possible.

Learning Objectives
By the end of the presentation the audience will understand:

  • Capabilities and Performance of Parabricks
  • Parabricks for secondary analysis

 

Please register ahead of time for sessions. Please send any questions to hpchelp@hpc.mssm.edu

 

Thank you, and we look forward to seeing you—

 

Scientific Computing and Data

Icahn School of Medicine at Mount Sinai