Cuda example

This example demonstrates how to compile and execute a CUDA program on one of the cluster's GPU nodes. It is assumed that you already have access and know how to log in.

0. Git clone the guides repo with the examples

To facilitate the demonstration, we have pre-prepared the necessary code and scripts in a repo. Your just need to execute the code and then explore it in further detail.

$ git clone https://github.com/ieeta-pt/HPC-guides.git
$ cd HPC-guides/examples/cuda

1. Preprare the environment

The initial step involves setting up the development environment, which in this case means loading the GCC compiler and CUDA libraries.

$ module load gcc
$ module load cuda

Currently there are two versions of CUDA installed (12.1 and 11.8). By default the latest one is always loaded when a version is not specified.

Note that if you want to run CUDA 11.8 you aldo need to use gcc 11 due to compatibility issues from CUDA.

2. Compile the cuda program

To compile the CUDA program, simply use the NVCC compiler:

$ nvcc vector_addition.cu -o vector_addition

3. Submit the job

The launch_cuda.sh script contains the necessary code to submit the Slurm job while requesting a GPU.

$ sbatch launch_cuda.sh
Submitted batch job 93

Check your directory for the output file and view its contents:

$ ll
total 808
drwxr-xr-x 2 tiagoalmeida students   4096 Jul  5 15:49 ./
drwxr-xr-x 3 tiagoalmeida students   4096 Jul  5 15:48 ../
-rw-r--r-- 1 tiagoalmeida students    248 Jul  5 15:49 Cuda-93.out
-rw-r--r-- 1 tiagoalmeida students    504 Jul  5 15:48 launch_cuda.sh
-rwxr-xr-x 1 tiagoalmeida students 803936 Jul  5 15:48 vector_addition*
-rw-r--r-- 1 tiagoalmeida students   2051 Jul  5 15:48 vector_addition.cu
$
$ cat Cuda-93.out 
Job Information for Job ID: 93 from tiagoalmeida
------------ ------------
Account: students
CPUs per Node: 2
GPU: NVIDIA RTX A2000
Partition: gpu
QOS: normal
Start Time: 2024-07-05 14:49:32 UTC
Running On Node: dl-srv-02
------------ ------------

---------------------------
__SUCCESS__
---------------------------
N                 = 1048576
Threads Per Block = 256
Blocks In Grid    = 4096
---------------------------