GPU Driver Updates Completed for CUDA 9 & Updated Partitions

Hello MARCC Users,

CUDA/9.0 modules should be available for use along with updated drivers which will allow for use on the GPU nodes, and can be used with the module command module load cuda/9.0.

Please again take a look at the systems page to review Blue Crab Hardware.

The next section will show how to use constraints with features to access the old bw-gpu and bw-parallel partitions, which have now been consolidated into gpu and parallel, respectively.  Here are some commands to understand all available features:

scontrol show nodes | grep Feature | sort | uniq
sinfo -N --format "%n, %f" | less
FeatureDescription
ivybridgeRefers to 48 core Ivy Bridge nodes with large memory
haswellRefers to 24 core Intel Haswell nodes
broadwellRefers to 28 core Intel Broadwell nodes
k80Refers to NVidia K80 GPU nodes (4 GPU 'units' per node)
p100 (test node)Refers to NVidia P100 GPU nodes (2 GPU 'units' per node) - contact MARCC Help for testing
lrgmemlrgmem = ivybridge
This is to make it easier to refer to large memory nodes outside of the 'lrgmem' partition

SLURM Constraints for Parallel and GPU Partitions

A) Requesting Haswell

In order to work on Haswell nodes, please add the constraint to your batch file:

#SBATCH -p gpu # or -p parallel
#SBATCH --constraint=haswell

B) Requesting Broadwell

In order to work on Broadwell nodes, please add the constraint to your batch file:

#SBATCH -p gpu # or -p parallel
#SBATCH --constraint=broadwell

C) Requesting Both Types

In this case, your job can increase it’s availability to all viable nodes.  Drop the constraint or ask for both types with:

#SBATCH -p gpu # or -p parallel
#SBATCH --constraint="haswell|broadwell"
# or just ignore constraints altogether

Now, consider the following: in a request 28 cores per node, the job implicitly works with Broadwell nodes, however, asking for 24 cores per node will not automatically imply Haswell targets, because that request fits within Broadwell too, so work with cases A, B, or C to make your preferences clear.

A final note, one continue to submit jobs without using any constraints, however, applying constraints may have value in important reproduction of jobs, or understanding why performance can differ from node to node. Some users may requires specific hardware for compilation of software (e.g. AVX2 instructions for Broadwell).

In some cases, it may be informative to inspect more about a submitted job:

env | grep SLURM # insert this in your job script and take a look

 

Comments are closed.