Hello MARCC Users,
CUDA/9.0 modules should be available for use along with updated drivers which will allow for use on the GPU nodes, and can be used with the module command
module load cuda/9.0.
Please again take a look at the systems page to review Blue Crab Hardware.
The next section will show how to use constraints with features to access the old
bw-parallel partitions, which have now been consolidated into
parallel, respectively. Here are some commands to understand all available features:
scontrol show nodes | grep Feature | sort | uniq
sinfo -N --format "%n, %f" | less
|ivybridge||Refers to 48 core Ivy Bridge nodes with large memory|
|haswell||Refers to 24 core Intel Haswell nodes|
|broadwell||Refers to 28 core Intel Broadwell nodes|
|k80||Refers to NVidia K80 GPU nodes (4 GPU 'units' per node)|
|p100 (test node)||Refers to NVidia P100 GPU nodes (2 GPU 'units' per node) - contact MARCC Help for testing|
|lrgmem||lrgmem = ivybridge
This is to make it easier to refer to large memory nodes outside of the 'lrgmem' partition
SLURM Constraints for Parallel and GPU Partitions
A) Requesting Haswell
In order to work on Haswell nodes, please add the constraint to your batch file:
#SBATCH -p gpu # or -p parallel #SBATCH --constraint=haswell
B) Requesting Broadwell
In order to work on Broadwell nodes, please add the constraint to your batch file:
#SBATCH -p gpu # or -p parallel #SBATCH --constraint=broadwell
C) Requesting Both Types
In this case, your job can increase it’s availability to all viable nodes. Drop the constraint or ask for both types with:
#SBATCH -p gpu # or -p parallel #SBATCH --constraint="haswell|broadwell" # or just ignore constraints altogether
Now, consider the following: in a request 28 cores per node, the job implicitly works with Broadwell nodes, however, asking for 24 cores per node will not automatically imply Haswell targets, because that request fits within Broadwell too, so work with cases A, B, or C to make your preferences clear.
A final note, one continue to submit jobs without using any constraints, however, applying constraints may have value in important reproduction of jobs, or understanding why performance can differ from node to node. Some users may requires specific hardware for compilation of software (e.g. AVX2 instructions for Broadwell).
In some cases, it may be informative to inspect more about a submitted job:
env | grep SLURM # insert this in your job script and take a look