Scientific Computing and Data / High Performance Computing / Minerva Quick Start

Minerva Quick Start Guide

Partitions

The Chimera partition:

286 compute nodes – 48 Intel 8168 cores (2.7GHz) and 192 GB memory
4x high memory nodes – 48 Intel 8168 cores (2.7GHz) and 1.5 TB memory
48 V100 GPUs in 12 nodes – 32 Intel 6142 cores (2.6GHz) and 384 GB memory – 4x V100-16 GB GPU

The BODE2 partition:

78 compute nodes – 48 Intel 8268 cores (2.9GHz) and 192 GB memory
Only BODE-enabled users have access to the BODE2 partition

The CATS partition:

3,520 64-core 2.6 GHz Intel IceLake processors in 55 nodes
1.5 TB of memory per node
82.5 TB memory (collectively)
Eligible for all NIH-funded projects

Connecting to Minerva

For security, Minerva uses the Secure Shell (ssh) protocol and Two Factor authentication. Unix systems typically have an ssh client already installed. Windows systems can download one of several ssh clients that are available for free such as PuTTY.

Two Factor authentication requires you to enter a password that is the combination of your Sinai password and a generated token (Either Software or Hardware token).
Software Token:
On an Android and/or iPhone, the application is called “VIP Access” and is published by Symantec. Blackberry, Windows Mobile, etc are also supported.

Hardware Token:
You can obtain a Hardware Token from the IT Helpdesk. We don’t have Hardware Token available now.
To setup two factor authentication visit the ASCIT website

From on-site and off-site

All users can login to Minerva cluster via ssh to minerva.hpc.mssm.edu. As a part of our HIPAA compliance activities, we need to shut down the external gateway access to Minerva. The High Performance Computing team has made adjustments so that all users can connect to internal login nodes, thus all users will need a VPN account for off-campus login. Please refer to here for details.

For example:

> ssh your_userid@minerva.hpc.mssm.edu
Password: > your_Sinai_password123456

( the > sign indicates what you would type in; 123456 represents the numeric sequence obtained from your token)

Click here for more about logging in.

File System

/hpc/users/<userid>	User HOME directories. 20GB quota. It is NOT purged and is backed up. Generally used for all the ‘rc’ and configuration files for various programs.
/sc/arion/work/<userid>	A WORK directory for each user. 100GB quota. It is NOT purged and it is NOT backed up. To be used for whatever purpose the user desires.
/sc/arion/scratch/<userid>	A folder for each user inside the /sc/arion/scratch directory. /sc/arion/scratch has a 100TB quota and it is shared by all users. This should be used in lieu of /tmp for temporary files as well as short term storage up to a maximum of 14 days. Files older than 14 days are purged automatically by the system.
/sc/arion/projects/<projectid>	PI’s can request project storage. Click here to submit an allocation request and renew annually. A directory for each approved project. The quota is set to the approved allocation for the project. It is NOT purged but it is NOT backed up.

Queues

The queues that are available are:
Default memory per core is set as 3000MB for all the queues.

Queue	Description	Max Walltime
Premium	Jobs requesting high priority with APS doubled as 200. Charged at 150% of alloc rate	144 hrs.
express	Jobs requiring less than 12 hours walltime	12 hrs.
interactive	Jobs running in interactive mode	12 hrs.
long	Jobs requiring more than 144 hours walltime	2 weeks
gpu	Jobs running on GPU nodes	144 hrs.
private	Jobs using dedicated resources	unlimited

LSF

Minerva uses LSF for batch submission. bsub is the submission command. Options can be put on the command line or in the submission script. HOWEVER, if the options are placed in the submission script, you must feed the script into the bsub command via stdin for the options to be read: E.g.,

cat MyLSF.script | bsub
or
bsub < MyLSF.script

Some important points of interest:

The default disposition for output and logs is for LSF to email the output to you. This piece is not working yet so you must use the “-o” option to save the output.
In general, the shortest quantum of time in LSF is 1 minute. Wall time is expressed as HHH:MM — There are no seconds. Durations are generally in minutes.
System level checkpoints are supported by LSF. There are some “gotchas” ( E.g., the default method does not work on our system) so check with the SC staff if you need/want to do checkpointing.

Some useful commands:

bjobs – shows all your jobs in the queue
bpeek – peek at your output before the job ends
bqueues – what queues are available
bkill – kill a job

Click here for additional Minerva documentation.

Acknowledging Mount Sinai in Your Work

Utilizing S10 BODE and CATS partitions requires acknowledgements of support by NIH in your publications. To assist, we have provided exact wording of acknowledgements required by NIH for your use. Click here for acknowledgements.

Supported by grant UL1TR004419 from the National Center for Advancing Translational Sciences, National Institutes of Health.