NVIDIA HPC SDK Setup
Overview
Omnia pre-deploys an NVIDIA HPC SDK setup script (/usr/local/bin/setup_nvhpc_sdk.sh)
to all Slurm nodes during provisioning. The setup follows a two-step manual workflow:
Run
--installon the compiler node to install via DNF and publish to shared NFS.Run the script (without arguments) on each compute node to mount from NFS.
Prerequisites
Slurm compiler node and compute nodes must be provisioned and running.
Shared NFS storage must be mounted and the path
/hpc_tools/nvidia_sdkmust be accessible on the compiler node before running the install step.NVIDIA package repositories are configured automatically during cloud-init. No manual repository setup is required.
Step 1 — Install on the Compiler Node
On the designated compiler/login node, run:
/usr/local/bin/setup_nvhpc_sdk.sh --install
This performs the following actions in sequence:
Installs the
nvhpcpackage via DNF from the pre-configured NVIDIA repository.Copies the installed SDK from
/opt/nvidia/hpc_sdkto the shared NFS path/hpc_tools/nvidia_sdk/nvhpc.Sets up a local bind mount:
/hpc_tools/nvidia_sdk/nvhpc→/opt/nvidia/nvhpc.Writes environment configuration to
/etc/profile.d/nvhpc.sh.
To force a reinstall when the SDK is already present on NFS:
/usr/local/bin/setup_nvhpc_sdk.sh --install --force
Note
If NVHPC is already present on NFS, the script skips the DNF install
and proceeds directly to the bind mount and environment setup, unless --force is specified.
Step 2 — Set Up on Compute Nodes
On each Slurm compute node, run:
/usr/local/bin/setup_nvhpc_sdk.sh
This performs the following actions:
Validates that the NVHPC SDK exists on NFS at
/hpc_tools/nvidia_sdk/nvhpc.Sets up a local bind mount:
/hpc_tools/nvidia_sdk/nvhpc→/opt/nvidia/nvhpc.Writes environment configuration to
/etc/profile.d/nvhpc.sh.
Note
Step 2 must be run after Step 1 is complete on the compiler node. If the SDK is not found on NFS, the script exits with an actionable error message.
Environment Variables Configured
After setup, the following variables are available in all login shells on both the compiler node and compute nodes:
Variable |
Value |
|---|---|
|
|
|
|
|
Auto-detected from the installed SDK version |
|
Prepended with compiler |
|
Appended with compiler |
|
Prepended with the nvhpc |
Verifying the Installation
After running the setup script, verify by sourcing the profile and checking the compiler:
source /etc/profile.d/nvhpc.sh
nvc --version
nvc++ --version
nvfortran --version
Logs
Setup output and errors are written to /var/log/nvhpc_sdk_setup.log on each node.
Check this file if the setup script fails:
cat /var/log/nvhpc_sdk_setup.log
Architecture Support
The script detects the node architecture automatically:
Architecture |
NFS Subdirectory |
|---|---|
|
|
|
|
No separate configuration is required for mixed-architecture clusters.
If you have any feedback about Omnia documentation, please reach out at omnia.readme@dell.com.