We provide integrated development environments (IDEs) for the R and Python languages. These environments can often make it easier to develop methods interactively on MARCC. This guide will explain the procedure for connecting to either a Jupyter notebook server or RStudio server. Both programs allow users to access MARCC compute and storage hardware from a web browser as long as they can start a dedicated ssh connection to the machine.

Users must start their Jupyter or RStudio sessions inside of a SLURM job so that these programs are executed on dedicated compute nodes. We forbid the use of these programs on the login node, which is a shared resource. There are many ways to start a server inside a SLURM job, but the easiest method is to request the job using the sbatch command. We will describe the Jupyter and RStudio methods separately below.

Jupyter Notebooks

Users can run a Jupyter notebook server with the jupyter_notebook_start command as long as this command runs inside a SLURM job job.

To start a Jupyter notebook server, use one of the following commands.

sbatch jupyter_notebook_start
sbatch -c 4 -t 6:0:0 -p shared jupyter_notebook_start
sbatch -c 2 -t 2:0:0 -p debug jupyter_notebook_start
sbatch -c 12 -t 2:0:0 -p gpuk80 --gres=gpu:2 jupyter_notebook_start

The options above sbatch are explained in the SLURM guide and allow you to request a specific amount of time and number of processors. Note that the debug partition has the shortest wait time and the lowest time limit. The default sbatch command reserves 1 processor for 1 hour on the shared partition. If you require GPUs, you must use the “gres” flag shown in the last example above.

To receive an email notification when the job starts, and your server is ready, use the following format.

sbatch --mail-type=begin --mail-user=my_address@my_school.edu -c 2 -t 2:0:0 -p debug jupyter_notebook_start

After you run sbatch, you can check the status of the resulting SLURM job by running sqme command. Your job should appear with the name “jupyter_notebook_start” and the state will be listed as either PD (pending) or R (running) indicating that it has started. It is best to only run one notebook server at a time. When you are finished with the server you can end the job by using the scancel 12345 command, where the number corresponds to your SLURM job number.

When the job starts, you will see an output file written by SLURM called e.g. slurm-12345.out, where the number corresponds to your SLURM job ID. This is the SLURM log file for this job. The text of this file contains instructions for connecting to the server.

The log file will tell you to run a command with the following format:

ssh -N -L 8147:compute0676:8147 username@school.edu@login.marcc.jhu.edu

Run this command in a dedicated terminal. The specific compute node and port numbers will be unique to your session. It will ask for your two-factor authentication and password. You will not see any output after it accepts your credentials. Do not close the terminal, or you will lose your connection to the server.

The slurm log will also provide a link with the following format:

http://localhost:8147/?token=abea8ff9a053b1c501a0e42885f407feacb8de00f493cc51

Open this link in a web browser for access to your server. You can select a specific Python version from the dropdown menu for creating new notebooks. The token serves as your authentication mechanism, however the ssh connection above is also necessary, and uses two-factor authentication to protect access to your data.

Remember to cancel your job with the scancel command when you are finished with the server.

Python Modules

The Jupyter notebook server will inherit the software modules you have loaded before running sbatch. There may be more than one Python version available inside the server, depending on which software you load before starting the notebook server. If you wish to install custom packages, you can use Anaconda. To set up an Anaconda environment, use the following commands. Note that you must install “ipykernel” in order to use your environment in the notebook server.

module load python/3.6-anaconda
conda create -p ./path/to/my_env_name
conda activate ./path/to/my_env_name
conda install ipykernel
conda install ...
pip install ...

Note that the Jupyter notebook server can provide notebooks for any Anaconda environments installed with the method above. These environments can be customized to suit your particular workflow.

RStudio

The open-source RStudio server provides a fully-featured IDE for R users. As with the Jupyter notebook server described above, users can access RStudio using the rstudio_server_start command as long as it runs inside a SLURM job.

Before you run RStudio, you must load a software module for R. We offer several versions. You can load the default version with the following command.

module load R

To start the RStudio server, use one of the following commands.

sbatch rstudio_server_start
sbatch -c 4 -t 6:0:0 -p shared rstudio_server_start
sbatch -c 2 -t 2:0:0 -p debug rstudio_server_start

The options above sbatch are explained in the SLURM guide and allow you to request a specific amount of time and number of processors. Note that the debug partition has the shortest wait time and the lowest time limit. The default sbatch command reserves 1 processor for 1 hour on the shared partition.

To receive an email notification when the job starts, and your server is ready, use the following format.

sbatch --mail-type=begin --mail-user=my_address@my_school.edu -c 2 -t 2:0:0 -p debug rstudio_server_start

After you run sbatch, you can check the status of the resulting SLURM job by running the sqme command. Your job should appear with the name “rstudio_server_start” and the state will be listed as either PD (pending) or R (running) indicating that it has started. It is best to only run one notebook server at a time. When you are finished with the server you can end the job by using the scancel 12345 command, where the number corresponds to your SLURM job number.

When the job starts, you will see an output file written by SLURM called e.g. slurm-12345.out, where the number corresponds to your SLURM job ID. This is the SLURM log file for this job. The text of this file contains instructions for connecting to the server. You will also see a folder called rstudio-session-12345 which contains the session information and cache for the server. If you run the RStudio server frequently, you may wish to remove these folders after you have completed your work.

The SLURM log file will tell you to run a command with the following format:

ssh -N -L 8147:compute0676:8147 username@school.edu@login.marcc.jhu.edu

Run this command in a dedicated terminal. The specific compute node and port numbers will be unique to your session. It will ask for your two-factor authentication and password. You will not see any output after it accepts your credentials. Do not close the terminal, or you will lose your connection to the server.

The slurm log will also provide a link with the following format:

http://localhost:8147

Open this link in a web browser for access to your server. You will be required to enter your MARCC username and password. Remember to cancel your job with the scancel command when you are finished with the server.

R Packages

You can install custom packages inside the RStudio IDE or in a regular terminal session. We encourage users to consult the interact command to install large packages inside R because it provides dedicated access to the debug partition and avoids the strict resource limits on our login nodes. Remember that your packages are associated with a specific version of R, which can be controlled with the module command before starting the server.