Input Parameters for the Cluster

The service_k8s_cluster depends on the inputs provided in the following files:

  • /opt/omnia/input/project_default/omnia_config.yml

  • /opt/omnia/input/project_default/security_config.yml

  • /opt/omnia/input/project_default/storage_config.yml

  • /opt/omnia/input/project_default/high_availability_config.yml

Caution

Do not remove, edit, or comment any lines in the above mentioned input files.

/opt/omnia/input/project_default/omnia_config.yml

Parameters for kubernetes setup on service Kubernetes cluster

Variables

Mandatory/Optional

Details

cluster_name

Mandatory

  • Type: String

  • Name of the cluster on which you want to deploy Kubernetes.

  • This input is case-sensitive. Do not add any special characters except _ (underscore) in the cluster name.

deployment

Mandatory

  • Type: Boolean

  • Indicates if Kubernetes will be deployed or not.

  • Accepted values: true or false

k8s_cni

Mandatory

  • Type: String

  • Kubernetes SDN network.

  • Accepted values: calico

  • Default value: calico

pod_external_ip_range

Mandatory

  • Type: String

  • These addresses will be used by the loadbalancer for assigning external IPs to Kubernetes services.

  • Ensure that the IP range provided is not assigned to any node in the cluster.

  • Sample values: 172.16.107.170-172.16.107.200

k8s_service_addresses

Optional

  • Type: String

  • Kubernetes internal network for services.

  • This network must be unused in your network infrastructure.

  • Default value: "10.233.0.0/18"

k8s_pod_network_cidr

Optional

  • Type: String

  • Kubernetes pod network CIDR for internal network. When used, it will assign IP addresses from this range to individual pods.

  • This network must be unused in your network infrastructure.

  • Default value: "10.233.64.0/18"

csi_powerscale_driver_secret_file_path

Optional

  • Type: File path

  • If you want to deploy the CSI driver for PowerScale on your service cluster, add the file path of the secrets.yaml file to this variable.

csi_powerscale_driver_values_file_path

Optional

  • Type: File path

  • If you want to deploy the CSI driver for PowerScale on your service cluster, add the file path of the values.yaml file to this variable.

nfs_storage_name

Mandatory

  • Type: String

  • Use same name as mentioned in each of the nfs_name available in storage_config.yml.

k8s_crio_storage_size

Mandatory

  • Type: String

  • Specifies the disk size allocated for CRI-O container storage.

service_k8s_cluster:
   - cluster_name: service_cluster
     deployment: true
     k8s_cni: "calico"
     pod_external_ip_range: ""
     k8s_service_addresses: "10.233.0.0/18"
     k8s_pod_network_cidr: "10.233.64.0/18"
     nfs_storage_name: "nfs_k8s"
     csi_powerscale_driver_secret_file_path: ""
     csi_powerscale_driver_values_file_path: "
     k8s_crio_storage_size: "20G"
Parameters for slurm setup

Variables

Details

cluster_name

string

Mandatory

  • Indicates the name of the cluster.

nfs_storage_name

string

Mandatory

  • Indicates the NFS storage name for the NFS storage to be used by this slurm cluster.

  • This is defined in storage_config.yml as nfs_name.

config_sources

filepath or mapping

Optional

  • Indicates how the slurm configuration values are provided to the cluster.

  • <conf name> : <filepath> or <mapping>

    • The conf files supported by slurm are slurm, cgroup, gres, mpi, helpers, job_container, acct_gather, oci, and topology.

    • <filepath>: Supply the absolute path to a custom configuration file.

    • <mapping>: Supply the configuration values directly as a key–value map

See the following sample:

slurm_cluster:
- cluster_name: slurm_cluster
  nfs_storage_name: nfs_slurm
  config_sources:
    slurm:
      SlurmctldTimeout: 60
      SlurmdTimeout: 150
    cgroup:
      CgroupPlugin: autodetect
      AllowedRAMSpace: 100

slurm_cluster:
- cluster_name: slurm_cluster
  nfs_storage_name: nfs_slurm
  config_sources:
   slurm: /path/to/custom_slurm.conf
   cgroup: /path/to/custom_cgroup.conf
   slurmdbd: /path/to/custom_slurmdbd.conf
   gres: /path/to/custom_gres.conf

/opt/omnia/input/project_default/security_config.yml

Parameters for OpenLDAP configuration

Parameter

Details

ldap_connection_type

string Required

For TLS connection, ensure port 389 is open.

Choices: * TLS <- Default

/opt/omnia/input/project_default/storage_config.yml

Parameters for Storage

Variables

Details

nfs_client_params

List (dict)

Required

  • This is a list of dictionaries.

  • nfs_name—Provide the name of the NFS share which is referred by slurm and service kubernetes.

  • server_ip—Provide the IP or hostname of the NFS server which is accessible to all the diskful and diskless nodes.

  • server_share_path—Provide the full path of the directory in the NFS Server.

  • client_share_path—Provide the full path where nfs server contents needs to be mounted.

  • client_mount_options—Provide the mount options as a comma-separated value. Possible values are: nosuid, rw, sync, hard, intr

Sample:

nfs_client_params:
- server_ip: "172.16.107.168" # Provide the IP of the NFS server
  server_share_path: "/mnt/share/omnia" # Provide server share path of the NFS Server
  client_share_path: /share_omnia
  client_mount_options: "nosuid,rw,sync,hard,intr"
  nfs_name: nfs_slurm

- server_ip: "172.16.107.121" # Provide the IP of the NFS server
  server_share_path: "/mnt/share/omnia_k8s" # Provide server share path of the NFS Server
  client_share_path: /share_omnia_k8s
  client_mount_options: "nosuid,rw,sync,hard,intr"
  nfs_name: nfs_k8s

powervault_config

dict

Optional

  • The PowerVault storage integration provides persistent external storage for critical Slurm components, ensuring high availability and data durability across node reboots or reimaging.

    • Slurm State Save Location (StateSaveLocation): Preserves Slurm controller state, job checkpoints, and scheduler data, enabling seamless recovery after controller restarts.

    • Slurm Database Storage (/var/lib/mysql): Houses the SlurmDBD (Slurm Database Daemon) MySQL/MariaDB data, maintaining job accounting records, user associations, and cluster usage history.

  • ip—A list of PowerVault iscsi target IP addresses used for iSCSI target discovery.

  • port—Defines the TCP port for the iSCSI target service. This is an optional parameter and if not provided, the default value is 3260.

  • iscsi_initiator—Specifies the InitiatorName used by the host when connecting to the iSCSI target.

  • volume_id—This is the unique WWN/serial-identifier for the specific volume that should be used for persistent storage.

Note

  • Powervault storage is not replacement for the NFS storage used by slurm cluster.

  • Validation of the ip, iscsi_initiator, or volume_id is not handled, ensure accurate values are provided.

Sample:

powervault_config:
    ip: - 172.1.2.3
    port: 3260
    iscsi_initiator: iqn.2025-01.com.dell:scontrol-node
    volume_id: 00c0ff4343f1f1f1001c8c4e6901000000

/opt/omnia/input/project_default/high_availability_config.yml

See the following sample:

service_k8s_cluster_ha:
  - cluster_name: service_cluster
    enable_k8s_ha: true
    virtual_ip_address: "172.16.107.1"
Parameters for Service Cluster HA

Parameter

Details

cluster_name

  • Type: String

  • Captures the name of the service cluster on which HA will be set up. Default value: service_cluster

enable_k8s_ha

  • Type: Boolean

  • Possible values: true

  • Default value: true

  • Indicates whether to enable HA for the Kubernetes (K8s) service node or not. Set to true to enable.

virtual_ip_address

  • Type: String

  • This is a mandatory and user-configurable parameter.

  • Captures the virtual IP address for the K8s service node HA setup. Ensure that the virtual_ip_address does not belong to the dynamic_range or static_range mentioned in the network_spec.yml.

  • Default value: 172.16.107.1

Caution

Ensure that the external NFS is accessible by all the nodes intended to be booted and is reachable by the admin network.

If you have any feedback about Omnia documentation, please reach out at omnia.readme@dell.com.