Software and Packages
- Software and Applications
- Schrodinger Suite
- MATLAB, Simulink and MATLAB Distributed Compute Server
Queues and Resources
- LSF Queues And Policies
- GPU Etiquette
- Access TSM with GUI
- Access TSM with Command Line
- Checkpoint Restart
- Disaster Recovery Plan
- Job Execution
- Multiple Serial Jobs
When you log into Minerva, you are placed on one of the login nodes.
Login nodes should only be used for basic tasks such as file editing, code compilation, data backup, and job submission. The login nodes should not be used to run production jobs. Production work should be performed on the system’s compute resources.
Access to compute resources and job scheduling are managed by LSF (Load Sharing Facility). Once access to compute resources has been allocated through the batch system, users have the ability to execute jobs on the allocated resources.
Submitting Batch Jobs
A batch job is the most common way users run production applications on Minerva. Typically, the user submits a batch script to the batch system. This script specifies, at the very least, how many cores the job will use, how long the job will run, and the name of the application to run. The job will advance in the queue until it has reached the top. At this point, LSF will allocate the requested number of cores to the batch job.
Sample Batch Scripts
Although there are default values for all batch parameters, it is a good idea always to specify the name of the queue, the number of cores, and the walltime for all batch jobs. To minimize the time spent waiting in the queue, specify the smallest walltime that will safely allow the job to complete.
This example requests 4 full nodes on the Mothra side (intel cores) for 2 hours in the “express” queue.
#!/bin/bash #BSUB -J myjobMPI #BSUB -P YourAllocationAccount #BSUB -q express #BSUB -n 48 #BSUB -R span[ptile=12] #BSUB -W 02:00 #BSUB -o %J.stdout #BSUB -eo %J.stderr #BSUB -L /bin/bash cd $LS_SUBCWD mpirun -np 48 /my/bin/executable < my_data.in
This is another example requesting 4 compute cores on a single node, asking 12GB of memory per core.
#!/bin/bash #BSUB -J myjob #BSUB -P YourAllocationAccount #BSUB -q express #BSUB -n 4 #BSUB -R span[ptile=4] #BSUB -R rusage[mem=12000] #BSUB -W 01:00 #BSUB -o %J.stdout #BSUB -eo %J.stderr #BSUB -L /bin/bash cd $LS_SUBCWD mpirun /my/bin/executable < my_data.in
Note that the memory requirement ( -R rusage[mem=12000] ) is in MB and is PER CORE, not per job. A total of 48GB of memory will be allocated for this job.
Please refer to the QUEUES section for further information about available queues and computing resource allocation.
Please refer to the GPU section for further information about options relevant to GPU job submissions.
Submit your batch script with the bsub command:
bsub < myjob.lsf
Running Serial Jobs
A serial job is one that only requires a single computational core. There is no queue specifically configured to run serial jobs. Serial jobs share nodes, rather than having exclusive access. Multiple jobs will be scheduled on an available node until either all cores are in use, or until there is not enough memory available for additional processes on that node.
The following script requests a single core and 2GB of memory for 4 hours in the “express” queue:
#BSUB -J myjob
#BSUB -P YourAllocationAccount
#BSUB -q express
#BSUB -n 1
#BSUB -R rusage[mem=2000]
#BSUB -W 04:00
#BSUB -o %J.stdout
#BSUB -eo %J.stderr
#BSUB -L /bin/bash
../mybin/my_exe < mydata.inp > results.log
Interactive batch jobs give users interactive access to compute resources. A common use for interactive batch jobs is debugging. Running a batch-interactive job is done by using the -I option with bsub.
Here is an example command creating an interactive shell on compute nodes:
bsub -P AllocationAccount -q interactive -n 8 -W 15 -R span[hosts=1] -Is /bin/bash
This command allocates a total of 8 cores. All cores are on the same node. The node running the interactive shell is known as the “head node” (sometimes called the “MOM node”). The -q option specifies this job will run in the “express” queue. And the job will have 15 minutes available. Once the interactive shell has been started, the user can execute jobs interactively there. When done with an interactive batch session, users should explicitly terminate the session with “exit”, in order to return the compute nodes to the batch system for use by other users.