How to submit multiple serial jobs over more than a single node?
Sometimes users want to submit large numbers of independent serial jobs as a single batch. Rather than using a script to repeatedly call bsub, a self-scheduling utility (“selfsched”) can be used to have multiple serial jobs bundled and scheduled over more than a single node with one bsub command.
Usage:
In your batch script, load the self-scheduler module and execute it using mpi wrapper (“mpirun”) in a parallel mode:
module load selfsched mpirun selfsched < YourInputForSelfScheduler
where YourInputForSelfScheduler is a file containing serial job commands like,
/my/bin/path/Exec_1 < my_input_parameters_1 > output_1.log /my/bin/path/Exec_2 < my_input_parameters_2 > output_2.log /my/bin/path/Exec_3 < my_input_parameters_3 > output_3.log . . .
Each line has 2048 character limit and TAB is not allowed.
Please note that one of compute cores is used to monitor and schedule serial jobs over the rest of cores. So, the actual number of cores used for the real computation is (the total number of cores assigned – 1).
A simple utility (“PrepINP”) is also provided to facilitate generation of YourInputForSelfScheduler file. The self-scheduler module has to be loaded first.
Usage:
module load selfsched PrepINP < templ.txt > YourInputForSelfScheduler
templ.txt contains input parameters with the number fields replaced by “#” to generate YourInputForSelfScheduler file.
Example 1:
1 10000 2 F ← start, end, stride, fixed field length? /my/bin/path/Exec_# < my_input_parameters_# > output_#.log
The output will be
/my/bin/path/Exec_1 < my_input_parameters_1 > output_1.log /my/bin/path/Exec_3 < my_input_parameters_3 > output_3.log /my/bin/path/Exec_5 < my_input_parameters_5 > output_5.log . . . /my/bin/path/Exec_9999 < my_input_parameters_9999 > output_9999.log
Example 2:
1 10000 1 T ← start, end, stride, fixed field length? 5 ← field length /my/bin/path/Exec_# < my_input_parameters_# > output_#.log
The output will be
/my/bin/path/Exec_00001 < my_input_parameters_00001 > output_00001.log /my/bin/path/Exec_00002 < my_input_parameters_00002 > output_00002.log /my/bin/path/Exec_00003 < my_input_parameters_00003 > output_00003.log . . . /my/bin/path/Exec_10000 < my_input_parameters_10000 > output_10000.log