Scientific Computing and Data / High Performance Computing / Documentation / Python and Jupyter Notebook

Python and Jupyter Notebook

Contents
Python
Jupyter Notebook

Python

Python is an interpreted programming language that has become increasingly popular in high-performance computing environments because it’s available with an assortment of numerical and scientific computing libraries (numpy, scipy, pandas, etc.), relatively easy to learn, open source, and free.

Many versions of Python are available to use on Minerva. To see a list of installed versions of Python on the cluster, use Lmod’s spider command:

$ ml spider python 
 
 
---------------------------------------------------------------------------- 
  python: 
---------------------------------------------------------------------------- 
 	Versions: 
    	python/2.7.9-UCS4 
    	python/2.7.16 
    	python/2.7.17-UCS4 
    	python/2.7.17 
    	python/3.4.0 
    	python/3.5.0 
    	python/3.6.2 
    	python/3.7.3 
    	python/3.8.2 
    	python/3.10.4 
 	Other possible modules matches: 
    	bx_python  lsf_python_api  wxPython 
 
 
---------------------------------------------------------------------------- 
  To find other possible module matches execute: 
 
 
  	$ module -r spider '.*python.*' 
 
 
---------------------------------------------------------------------------- 
  For detailed information about a specific "python" module (including how to load the modules) use the module's full name. 
  For example: 
 
 
 	$ module spider python/3.6.2 
----------------------------------------------------------------------------

Note that new versions of Python are periodically added and the default version of Python changes accordingly. You may want to explicitly specify the version of Python in your module load command (i.e., ml python/3.8.2) to avoid picking up a new version of Python when you don’t want it. We would recommend users to use the most recent version installed. Our current Python installation comes with many popular scientific and high-performance computing packages preinstalled.

Usage

Python may be run either interactively, or as a batch process reading commands from a script file. To run Python interactively, open an interactive session by submitting a job to the interactive LSF queue. Then simply execute the Python command. After the terminal is launched in the interactive mode, users can run Python commands at the prompt:

$ bsub -q interactive -P acc_hpcstaff -n 1 -W 1:00 -R rusage[mem=8000] -XF -Is /bin/bash 
Job <66690152> is submitted to queue . 
<> 
<> 
<> 
$ ml python 
$ python  
Python 3.7.3 (default, Oct 13 2020, 20:41:27)  
[GCC 8.3.0] on linux 
Type "help", "copyright", "credits" or "license" for more information. 
>>>  
>>> print("Hello World!") 
Hello World! 
>>>

The syntax of running a Python script consists of a sequence of Python commands:

$ python

For example,

$ cat hello.py 
print("Hello World!") 
$ python hello.py 
Hello World!

Installing New Packages Locally

If you find that a particular package you need is missing from the Python version you are using, you may open a ticket at hpchelp@hpc.mssm.edu to have it installed in the system area or install it yourself in your local space. In some cases, you will need to build/maintain your own set of Python packages in your space locally due to various reasons. There are multiple ways to install Python packages. Here we present two of them in detail below as examples:

1. Using pip
  pip is the package installer for Python. You can use it to install packages from the Python Package Index
  and other indexes.
  Python will need to be loaded in module before using pip:
```
$ ml python
```
  The syntax of installing a single Python package is:
```
$ pip install --user package_name==version
```
  For example,
```
$ pip install --user numpy==1.21.6
```
  Packages will be installed in:
```
~/.local/lib/python_version/site-packages/
```
  For example, for Python 3.7.3, the path is:
```
~/.local/lib/python3.7/site-packages/
```
  Then, prepend the package path and bin path to PYTHONPATH and PATH environment variables:
```
$ export PYTHONPATH=~/.local/lib/python_version/site-packages/:$PYTHONPATH 
$ export PATH=~/.local/bin/:$PATH
```
  You should be able to use the new package now.
```
$ python 
>>> import numpy 
>>> numpy.__version__ 
'1.21.6'
```
  You can also install packages to a specific location by adding the –prefix:
```
$ pip install --prefix=/path/to/folder package_name==version
```
  You will also need to prepend the paths as shown above:
```
$ export PATH=/path/to/folder/bin/:$PATH 
$ export PYTHONPATH=/path/to/folder/lib/python_version/site-packages/:$PYTHONPATH
```
2. Using venv
  venv allows you to manage separate package installations for different projects. They essentially allow you to create a “virtual” isolated Python installation and install packages into that virtual installation. When you switch projects, you can simply create a new virtual environment and not have to worry about breaking the packages installed in the other environments. It is always recommended to use a virtual environment while developing Python applications.
  Python will need to be loaded in module before using venv:
```
$ ml python
```
  Creation of virtual environments is done by executing the command venv:
```
$ python -m venv /path/to/new/virtual/env
```
  venv will create a virtual Python installation in the “/path/to/new/virtual/env” folder.
  Before you can start installing or using packages in your virtual environment, you’ll need to activate it:
```
$ unset PYTHONPATH 
$ source /path/to/new/virtual/env/bin/activate
```
  Activating a virtual environment will put the virtual environment-specific python and pip executables into your shell’s PATH.
  You can confirm you’re in the virtual environment by checking the location of your Python interpreter:
```
$ which python
```
  It should be in the env directory:
```
/path/to/new/virtual/env/bin/python
```
  The name of the virtual environment will also show in front of your username in the prompt:
```
(env) [user_name@li03c03 ~]$
```
  As long as your virtual environment is activated, you can use pip for package installation and pip will install packages into that specific environment by default. You’ll also be able to import and use packages in your Python application.
  If you want to switch projects or otherwise leave your virtual environment, simply run:
```
$ deactivate
```
  If you want to re-enter the virtual environment just follow the same instructions above about activating a virtual environment. There’s no need to re-create the virtual environment.

Using Anaconda
You can also create a virtual environment using Anaconda. Anaconda is available on Minerva:

$ ml spider anaconda3 
 
 
---------------------------------------------------------------------------- 
  anaconda3: 
---------------------------------------------------------------------------- 
     Versions: 
        anaconda3/latest 
        anaconda3/4.5.11 
        anaconda3/4.6.4 
        anaconda3/2018.12 
        anaconda3/2019.10 
        anaconda3/2020.11 
        anaconda3/2021.5 
 
 
---------------------------------------------------------------------------- 
  For detailed information about a specific "anaconda3" module (including how to load the modules) use the module's full name. 
  For example: 
 
 
     $ module spider anaconda3/4.6.4 
----------------------------------------------------------------------------

Please click here for more information about using anaconda.

Jupyter Notebook

Jupyter notebooks (formerly iPython notebooks) is an interactive computational environment, in which you can code interactively in Python from a web browser with support for equation editing, code execution, rich text, mathematics, inline plotting, rich media etc.

On the Minerva cluster, you can access the Jupyter notebook running on compute nodes via port forwarding (details refer to here). You can run step-by-step commands to start a Jupyter notebook running from Minerva compute nodes and access it at your local web browser. We also provided in-house wrappers/tools to access the Jupyter notebook via one simple command line such as “minerva-jupyter-module-web.sh” or “minerva-jupyter-web.sh”.

With those tools, Jupyter notebook servers run on the Minerva compute nodes as LSF jobs with dedicated resources. You can request the needed resources for your Jupyter interactive work as you do in other LSF batch jobs. It is recommended that the Jupyter notebook is used only for code development and testing on smaller samples. For computationally intensive or long running tasks, the bulk computation should be performed in Python scripts submitted as non-interactive batch jobs, if possible.

Table 1 summary comparison of the two wrappers

	minerva-jupyter-module-web.sh	minerva-jupyter-web.sh
Access modules on Minerva	Yes	No
Using singularity image	No	Yes
Support GPU node	Yes	Yes
Python versions	By default, python/3.7.3; You can load other python version and other modules needed for your Jupter Notebook by -mm option	This script uses the python within this Singularity image (shub://ISU-HPC/jupyter)
Others	For users who want to access Minerva modules.	For users who want an isolated/clean env working with a container image. You need to install/maintain your own python related package. No module system setup

Option1:minerva-jupyter-module-web.sh

One simple command to get interactive web sessions in a Minerva LSF job (Available on login nodes only). You can check the script at /usr/local/bin/minerva-jupyter-module-web.sh.

Usage:
For example, to start jupyter notebook web session with python/3.7.3, on the login nodes, run commands

minerva-jupyter-module-web.sh

(using python/3.7.3) with default resource configuration and URL to access it. Please see the

--help

option for help messages containing resource requests and installing packages.

$ minerva-jupyter-module-web.sh
[INFO] Project is not specified, or is acc_null, using 1st avail project.
[INFO] Project to use is acc_psychgen
[INFO] Parameters used are: 
[INFO] -n	4
[INFO] -M	3000
[INFO] -W	6:00
[INFO] -P	acc_psychgen
[INFO] -J	jupyter
[INFO] -q	premium
[INFO] -R	null
[INFO] -mm	null
[INFO] -env  null
[INFO] Submitting jupyter job...
Job <61550541> is submitted to queue .
[INFO] Wait and see below for web access when job starts.
<< output from stdout >>
Using local port 8888
Jupyter Notebook is started on compute node lc02g01, port 8888
<< output from stderr >>
Currently Loaded Modules:
  1) gcc/8.3.0   2) unixODBC/2.3.9   3) python/3.7.3
[I 16:22:11.473 NotebookApp] Serving notebooks from local directory: /hpc/users/gail01
[I 16:22:11.473 NotebookApp] Jupyter Notebook 6.1.4 is running at:
[I 16:22:11.473 NotebookApp] http://lc02g01:8888/?token=f9e6d8dea461b8d6837ff11bc1e3ed32f80e1b282c22b305
[I 16:22:11.473 NotebookApp]  or http://127.0.0.1:8888/?token=f9e6d8dea461b8d6837ff11bc1e3ed32f80e1b282c22b305
[I 16:22:11.473 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 16:22:11.477 NotebookApp]    
    To access the notebook, open this file in a browser:
        file:///hpc/users/gail01/.local/share/jupyter/runtime/nbserver-219276-open.html
    Or copy and paste one of these URLs:
        http://lc02g01:8888/?token=f9e6d8dea461b8d6837ff11bc1e3ed32f80e1b282c22b305
     or http://127.0.0.1:8888/?token=f9e6d8dea461b8d6837ff11bc1e3ed32f80e1b282c22b305
[INFO] Token not ready, trying ...


[INFO] Copy the following link in your browser for the jupyter notebook web access.
[INFO] http://10.95.46.103:40541/?token=f9e6d8dea461b8d6837ff11bc1e3ed32f80e1b282c22b305



For Jupyter sessions, after see links blinking with the url with token (no password), like the above
 “
[INFO] Copy the following link in your browser for the jupyter notebook web access.
[INFO] http://10.95.46.103:40541/?token=f9e6d8dea461b8d6837ff11bc1e3ed32f80e1b282c22b305
”

Copy the url and paste it in your browser to access the Jupyter Notebook web session. Note: You can load Minerva modules needed for your Jupyter Notebook with –mm option.

Please see the

--help

option for help messages containing resource requests and installing packages.

$ minerva-jupyter-module-web.sh --help
[INFO] This script is to submit a Python Jupyter Notebook web instance inside an
[INFO] LSF job on *one single host* for users.
[INFO] By default, this script uses Jupyter from python/3.7.3
[INFO] You can load other python version and other modules needed for your Jupter Notebook by -mm option
[INFO] 
[INFO] 09/01/2021
[INFO] Contact: hpchelp@hpc.mssm.edu
[INFO] 
[INFO] minerva-jupyter-module-web.sh  -n  -M  -W 
[INFO]                         -p  -J  -q  
[INFO]                         --mm 
[INFO] 
[INFO] -n | --ncpus		Number of CPU slots, will be allocated in one host using -R 'span[nhost=1]' default 4 if not specified.
[INFO] -M | --mem		Memory in Megabytes per CPU slot, used for resource request, default 3000 MB. -R 'rusage[mem=3000]'
[INFO] -W | --timelimit		Wall time for the job, format HH:MM. Default is 6:00/6 hours
[INFO] -P | --project		Minerva account for this job, default the first acc_ from your mybalance if not specified
[INFO] -J | --jobname		Specify the job name, default jupyter
[INFO] -q | --queue		Specify the queue name, default premium
[INFO] -R | --resource	Optional resource, eg himem, a100, v100
[INFO] -mm |--module		module your want to loaded, seperated by , for multiple modules 
[INFO] -env | --myenv	anaconda env you want to activate if anaconda is loaded 
[INFO] -h | --help		Help message
[INFO] 
[INFO] Files and directories:
[INFO] /hpc/users/gail01/minerva_jobs/jupyter_jobs			The directory where this script generates the job submission scripts. 
[INFO] 
[INFO] Job output and error files will be saved in the current working directory when you run this script. 
[INFO] If job is still running, use bpeek  to check the output

Behind the scene, the wapper tool just runs a LSF command and the port forwarding as shown below. You can issue those commands step by step with more control by yourself and work with the port forwarding. Here are the details:

# start an interactive session for example

$bsub -P acc_xxx -q interactive -n 2 -R "span[hosts=1]" -R 
rusage[mem=4000] -W 3:00 -Is /bin/bash

#Then on the allocated nodes lc01c30, start Jupyter Notebook

lc01c30 $ml python
lc01c30$jupyter notebook --no-browser --port=8889

#On your local workstation, forward port XXXX(8889) to YYYY(8888) and listen to it

$ssh -t -t -L localhost:8888:localhost:8889 
gail01@minerva.hpc.mssm.edu ssh -X lc01c30 -L 
localhost:8889:localhost:8889

#Open firefox on local: http://localhost:8888
Note: you can change the portal number 8888/8889 to others

Option 2: minerva-jupyter-web.sh

minerva-jupyter-web.sh is on-the-fly Jupyter Notebook in a Minerva LSF job using the python jupyter notebook within a singularity image. It is a containerized application for workflow reproducibility (for users who want an isolated/clean env working with container image), and related packages are set to be installed in $HOME/.local.

By default, it uses the python within this Singularity image (shub://ISU-HPC/jupyter). You can use your own image with the option -i, but this may need a bit modification of the path for python in line 335 and following at /usr/local/bin/minerva-jupyter-web.sh. You can consult hpchelp@hpc.mssm.edu if you have issues with this.
There is no module system setup, so you cannot access any central modules maintained on Minerva. You will need to install/maintain your own python related package as below:
- Open the terminal in web the jupyter web, type pip install packages –user
- This will be in your home directory $HOME/.local. Please restart the jupyter notebook to pick up the changes

Usage
For example, to start a Jupyter notebook web session with a container image, on the login nodes, run commands minerva-jupyter-web.sh with default resource configuration and URL to access it. Please see the --help option for help messages containing resource requests and installing packages.

 $ minerva-jupyter-web.sh 
[INFO] Image not specified, check if previously used
[INFO] No previously used image in /hpc/users/gail01/minerva_jobs/jupyter_jobs, pulling the default image shub://nickjer/singularity-jupyter
[INFO] Pulling image to /hpc/users/gail01/minerva_jobs/jupyter_jobs/jupyter_latest.sif
INFO:    Downloading shub image
328.0MiB / 328.0MiB [==============================================================================] 100 % 201.5 MiB/s 0s
[INFO] Project is not specified, or is acc_null, using 1st avail project.
[INFO] Project to use is acc_psychgen
[INFO] Parameters used are: 
[INFO] -n	4
[INFO] -M	3000
[INFO] -W	6:00
[INFO] -P	acc_psychgen
[INFO] -J	jupyter
[INFO] -q	premium
[INFO] -R	null
[INFO] -g	0
[INFO] -i	/hpc/users/gail01/minerva_jobs/jupyter_jobs/jupyter_latest.sif
[INFO] Submitting jupyter job...
Job <61550613> is submitted to queue .
[INFO] Wait and see below for web access when job starts.
<< output from stdout >>
Using local port 8888
Jupyter Notebook is started on compute node lc01a11, port 8888
 
 
<< output from stderr >>
[I 16:44:57.521 NotebookApp] JupyterLab beta preview extension loaded from /usr/local/lib/python3.6/site-packages/jupyterlab
[I 16:44:57.521 NotebookApp] JupyterLab application directory is /usr/local/share/jupyter/lab
[I 16:44:57.526 NotebookApp] Serving notebooks from local directory: /hpc/users/gail01
[I 16:44:57.526 NotebookApp] 0 active kernels
[I 16:44:57.527 NotebookApp] The Jupyter Notebook is running at:
[I 16:44:57.527 NotebookApp] http://lc01a11:8888/?token=d863e7b88b5ab7344846e182d29bf0ba97a5edc7b9b93de1
[I 16:44:57.527 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 16:44:57.527 NotebookApp] 
    
    Copy/paste this URL into your browser when you connect for the first time,
    to login with a token:
        http://lc01a11:8888/?token=d863e7b88b5ab7344846e182d29bf0ba97a5edc7b9b93de1&token=d863e7b88b5ab7344846e182d29bf0ba97a5edc7b9b93de1
[INFO] Token not ready, trying ...
 
 
[INFO] Copy the following link in your browser for the jupyter notebook web access.
[INFO] http://10.95.46.103:40613/?token=d863e7b88b5ab7344846e182d29bf0ba97a5edc7b9b93de1

Please see the –help option for help messages containing resource requests and installing packages.

 $minerva-jupyter-web.sh --help
[INFO] This script is to submit a Singularity containerized Jupyter Notebook web instance inside an
[INFO] LSF job on *one single host* for users.
[INFO] By default, this script uses this Singularity image (shub://ISU-HPC/jupyter)
[INFO] see https://singularity-hub.org/collections/1069/usage for the this image info.
[INFO] 
[INFO] Wei Guo
[INFO] 12/14/2020
[INFO] Contact: hpchelp@hpc.mssm.edu
[INFO] 
[INFO] minerva-jupyter-web.sh  -n  -M  -W 
[INFO]                         -p  -J  -q  
[INFO]                         --image 
[INFO] 
[INFO] -n | --ncpus		Number of CPU slots, will be allocated in one host using -R 'span[nhost=1]' default 4 if not specified.
[INFO] -M | --mem		Memory in Megabytes per CPU slot, used for resource request, default 3000 MB. -R 'rusage[mem=3000]'
[INFO] -W | --timelimit		Wall time for the job, format HH:MM. Default is 6:00/6 hours
[INFO] -P | --project		Minerva account for this job, default the first acc_ from your mybalance if not specified
[INFO] -J | --jobname		Specify the job name, default jupyter
[INFO] -q | --queue		Specify the queue name, default premium
[INFO] -R | --resource	Optional resource, eg himem, a100, v100
[INFO] -g | --ngpus		Optional. Specify the number of GPUs to use on a single node, default is 0
[INFO] -i | --image		Full path of the image file your specified other than the default. If not specified, this script will pull the default image in your /hpc/users/gail01/minerva_jobs/jupyter_jobs directory. 
[INFO] -h | --help		Help message
[INFO] 
[INFO] Files and directories:
[INFO] /hpc/users/gail01/minerva_jobs/jupyter_jobs			The directory where this script generates the job submission scripts. 
[INFO] /hpc/users/gail01/minerva_jobs/jupyter_jobs/jupyter_latest.sif			The default file of the image.
[INFO] 
[INFO] Job output and error files will be saved in the current working directory when you run this script. 
[INFO] If job is still running, use bpeek  to check the output

There is also a tool called minerva-jupyter-r-web.sh, which supports both Jupyter notebook Python 3 and R3 kernels. Run the –help option for help messages containing resource requests and installing packages.

What happens behind the scene? This tool wraps the following tasks in one command.

downloads a custom built Singularity container image of Jupyter Notebook in your home directory
writes and submits an LSF job script to launch the Jupyter Notebook within the image,
provides the URL link to access the instance

Submit Jupyter notebook as a batch job

The Jupyter command, which is available from the python installation (ml python), comes with a very versatile command jupyter-nbconvert. With this command you can convert your notebook to python, html, pdf and execute our notebook in batch or on the command line. For all the options:

jupyter-nbconvert –help.

To run a notebook from the command line:

jupyter-nbconvert --to notebook --ExecutePreprocessor.timeout=-1 --execute myfile.ipynb

To run this in batch, just wrap it in a shell script and submit it using LSF. If you want the results to be part of the notebook, use the –inplace option.
You may also just want to convert the notebook to straight python:

jupyter nbconvert myfile.ipynb --to python