This page provides a guide for installing packages in R on Blue Crab. Most users accomplish this with install.packages, however the rich ecosystem of R packages requires some additional effort to install on HPC resources.

If you were referred to this site by a member of the MARCC support staff, please read through all of cases below to determine which strategy is best for your software. Correspondence with marcc-help@marcc.jhu.edu should include a complete description of your strategy.

Cases

The following list describes the methods for installing R packages on Blue Crab.

  1. Software Modules provide R packages directly, as long as you are using our new software modules.

Case 1: Software Modules

Many common packages have been added to our new software modules which are accessible via Lmod and the ml av command. You can switch to the new modules by using ml stack, however we recommend that all uses consult the guide when choosing this option.

This option is best for users who need the common packages installed in the ml stack software modules. We have built these packages to sidestep the typical install.packages method which is time-consuming and often depletes the free space in your home directories. Therefore, if you can find your software with this method, it will save you time and space.

By way of example, imagine that you wish to install Rcpp. Typically you would run install.packages('rcpp') however this requires more than a few minutes of time and creates many extra files. Instead, use the following method.

$ ml stack
$ ml av
$ ml r/3.6.1
$ ml r-rcpp
$ R
> library('rcpp')

This provides the Rcpp package with minimal effort. We over dozens of additional R packages, all of which are kindly provided by Spack, on our new software modules.

Case 2: Direct installation

The example above, in which we used the Rcpp package installed on our software modules system, is the best-case scenario because it provides software without a custom installation. If you cannot find your software on ml stack, then you can install it yourself with R directly, using install.packages. Before doing so, please consult the documentation for your target.

Many R packages require additional packages to be already installed on your system. The documentation almost always recommends that you install it yourself using e.g. sudo apt-get install. Since MARCC provides a shared HPC cluster, this is forbidden, hence you must consult the software modules to find the right packages.

For example, imagine that you require a package which depends on the rJava package. If you try to install this directly, you will receive an error message that says: “Unable to run a simple JNI program.” To overcome this message, you should consult our modules system with ml av and load the Java module first. The modules system therefore takes the place of a typical package manager. Your installation procedure would then proceed as follows.

$ ml R/3.6.1
$ ml java
$ R
> install.packages('rJava')

Issue: network security

We encourage all users to start an interactive session with interact when using this method. This command will start an interactive SLURM job on a compute node. Only the compute nodes can access cran to download the packages. This problem produces the following error.

Error: Failed to install 'unknown package' from URL:
  (converted from warning) Problem closing connection:  No space left on device

If you see this error, you must visit a compute node to complete the installation.

Case 3: Extra dependencies required

If you cannot install your software according to the two cases above, you will need to manage your extra dependencies in one of two ways, by either compiling the code from source, or by using Anaconda. Since both of these options require extra work, we strongly recommend checking for your software on our software tree first.

If you are able to compile your supporting packages, then you typically only need to add its path to your environment variables. For the remainder of this section we will explain how to install the pdftools package in R because it provides a useful example of an R package with extensive software requirements which are not available on our software tree. To overcome this problem, we will instal a completely independent copy of R and the supporting packages. We will follow the method for building Anaconda environments.

First, find a location on your ~/data directory to install a new environment. Do not use the Lustre filesystem (~/scratch and ~/work) for this. We will use an absolute path to build our software environment. To build the environment we will first create a “requirements” file called reqs.yaml with the following contents.

dependencies:
  - conda-forge::r-pdftools

This file uses the standard YAML syntax. Once you prepare this file and choose a location, you can build the environment. You may wish to use interact to build this on a compute node.

$ ml anaconda
$ conda env update --file ./path/to/reqs.yaml -p ./path/to/env
$ conda activate ./path/to/env
$ R 
> library('pdftools')

By requesting the r-pdftools package from [Anaconda cloud], we have also installed an independent R distribution which is entirely separate from the software modules. Whenever you wish to use this environment, you will have to run the following commands.

$ ml anaconda
$ conda activate ./path/to/env

Do not modify your ~/.bashrc and do not use source activate to access this environment. Instead you must always use conda activate along with the absolute path, as well as conda deactivate to leave the environment. See the guide to Anaconda environments for more details. In short, the file above takes the place of a conda install command which installs the pdftools packages from the conda-forge channel.

You are welcome to continue adding additional packages based on Python, R, or any package available on Anaconda cloud using the method described above. This method improves reproducibility because it allows you to reconstitute your environment on new hardware from your requirements file.