Update the Input Parameters for Discovering the Nodes

Specify the required parameters in the following input files:

  • /opt/omnia/input/project_default/provision_config.yml

  • /opt/omnia/input/project_default/omnia_config_credentials.yml

  • /opt/omnia/input/project_default/software_config.json

  • /opt/omnia/input/project_default/storage_config.yml

  • /opt/omnia/input/project_default/omnia_config.yml

  • /opt/omnia/input/project_default/telemetry_config.yml

Caution

Do not remove or comment any lines in the above mentioned .yml files.

provision_config.yml

Parameter

Details

pxe_mapping_file_path

string

Optional

  • Enter the path where user has placed the PXE mapping CSV file that contains the node details for provisioning.

  • The file must follow the format: FUNCTIONAL_GROUP_NAME, GROUP_NAME, SERVICE_TAG, PARENT_SERVICE_TAG, HOSTNAME, ADMIN_MAC, ADMIN_IP, BMC_MAC, BMC_IP

  • This variable is required to discover nodes using a mapping file.

  • The headers of the .csv file are are FUNCTIONAL_GROUP_NAME, GROUP_NAME, SERVICE_TAG, PARENT_SERVICE_TAG, HOSTNAME, ADMIN_MAC, ADMIN_IP, BMC_MAC, BMC_IP

  • A sample file is provided here: /omnia/examples/pxe_mapping_file.csv

timezone

string

Required

Timezone to be used during OS provisioning. Available timezones are provided here.

Choices:

  • GMT <- default

  • EST

  • CET

  • MST

  • CST6CDT

  • PST8PDT

language

string

Required

Language to be used during OS provisioning.

Default value: en_US.UTF-8

default_lease_time

integer

Required

Default lease time for IPs assigned by DHCP. Range: 21600-86400

Default value: 86400

Note

The /opt/omnia/input/project_default/omnia_config_credentials.yml file is encrypted on the first execution of the discovery.yml or local_repo.yml playbooks.

  • To view the encrypted parameters:

    ansible-vault view omnia_config_credentials.yml --vault-password-file .omnia_config_credentials_key
    
  • To edit the encrypted parameters:

    ansible-vault edit omnia_config_credentials.yml --vault-password-file .omnia_config_credentials_key
    
software_config.json

Parameter

Mandatory/Optional

Details

cluster_os_type

Mandatory

  • Type: string

  • Specify the operating system running on the OIM and the one to be provisioned on the compute nodes.

  • Accepted value: rhel

cluster_os_version

Mandatory

  • Type: string

  • The OS Version that will be provisioned on compute nodes.

  • Accepted value: 10.0.

repo_config

Mandatory

  • Type: string

  • Omnia sets up a local Pulp repository on the OIM and downloads all the necessary packages or images for the cluster into this repository. The behavior of the Pulp container varies depending on the value of the repo_config parameter.

  • In case of always, packages are downloaded and cached on OIM during local_repo.yml execution; compute nodes get the packages from OIM. Caching these packages allows them to be reused in future operations without needing to download them again.

  • In case of partial, packages are not pre-downloaded or cached. OIM downloads from upstream URLs when needed.

  • Accepted value: always, partial

  • Default value: always

Note

The following packages will always be downloaded from the local Pulp repository, regardless of the value of repo_config:

  • ISOs

  • pip modules

  • manifests

  • tarballs

  • container images

softwares

Mandatory

  • Type: JSON list

  • A JSON list of required software with the software version (optional) and architecture type aarch64, x86_64 or both. This field is mandatory.

  • The following software should be listed with a version in the list: OpenLDAP, NFS, Slurm, service_k8s, utils, ucx, openmpi.

  • A minimum of one software should be provided in the list for local_repo.yml to execute correctly.

  • The software_config.json will have the basic softwares present in it. To add additional software stacks, add the software under /opt/omnia/input/project_default/software_config.json.

  • For the list of all applicable softwares based on your <cluster_os_type>, see the templates at examples/template_<os>_software_config.json. For example, /omnia/examples/rhel_software_config.json

Note

The accepted names for software are taken from /opt/omnia/input/project_default/config/<architecture>/<cluster_os_type>/<cluster_os_version>.

storage_config.yml

Variables

Details

s

nfs_client_params

List (dict)

Required

  • This is a list of dictionaries.

  • nfs_name—Provide the name of the NFS share which is referred by slurm and service kubernetes.

  • server_ip—Provide the IP or hostname of the NFS server which is accessible to all the diskful and diskless nodes.

  • server_share_path—Provide the full path of the directory in the NFS Server.

  • client_share_path—Provide the full path where nfs server contents needs to be mounted.

  • client_mount_options—Provide the mount options as a comma-separated value. Possible values are: nosuid, rw, sync, hard, intr

Sample:

nfs_client_params:
- server_ip: "172.16.107.168" # Provide the IP of the NFS server
  server_share_path: "/mnt/share/omnia" # Provide server share path of the NFS Server
  client_share_path: /share_omnia
  client_mount_options: "nosuid,rw,sync,hard,intr"
  nfs_name: nfs_slurm

- server_ip: "172.16.107.121" # Provide the IP of the NFS server
  server_share_path: "/mnt/share/omnia_k8s" # Provide server share path of the NFS Server
  client_share_path: /share_omnia_k8s
  client_mount_options: "nosuid,rw,sync,hard,intr"
  nfs_name: nfs_k8s

Note

When PowerScale is configured as an NFS server, ensure that the following CSI-PowerScale driver entry is present in the software_config.json file:

{"name": "csi_driver_powerscale", "version": "v2.15.0", "arch": ["x86_64"]}

For more information on deploying the Dell CSI-PowerScale driver, see Deploy CSI drivers for Dell PowerScale Storage Solutions.

The following table lists the parameters that must be configured in omnia_config.yml for slurm cluster.

omnia_config.yml

Variables

Details

cluster_name

string

Mandatory

  • Indicates the name of the cluster.

nfs_storage_name

string

Mandatory

  • Indicates the NFS storage name for the NFS storage to be used by this slurm cluster.

  • This is defined in storage_config.yml as nfs_name.

config_sources

dict

Optional

  • Indicates how the slurm configuration values are provided to the cluster.

  • <conf name> : <dict>

    • The conf files supported by slurm are slurm, cgroup, gres, mpi, helpers, job_container, acct_gather, oci, plugstack, and topology.

    • <dict>: Supply the configuration values directly as a key–value map.

Note

For slurm, Nodeset and Partition configurations are not supported.

The following table lists the parameters that must be configured in omnia_config.yml for service Kubernetes cluster.

omnia_config.yml

Variables

Mandatory/Optional

Details

cluster_name

Mandatory

  • Type: String

  • Name of the cluster on which you want to deploy Kubernetes.

  • This input is case-sensitive. Do not add any special characters except _ (underscore) in the cluster name.

deployment

Mandatory

  • Type: Boolean

  • Indicates if Kubernetes will be deployed or not.

  • Accepted values: true or false

k8s_cni

Mandatory

  • Type: String

  • Kubernetes SDN network.

  • Accepted values: calico

  • Default value: calico

pod_external_ip_range

Mandatory

  • Type: String

  • These addresses will be used by the loadbalancer for assigning external IPs to Kubernetes services.

  • Ensure that the IP range provided is not assigned to any node in the cluster.

  • Sample values: 172.16.107.170-172.16.107.200

k8s_service_addresses

Optional

  • Type: String

  • Kubernetes internal network for services.

  • This network must be unused in your network infrastructure.

  • Default value: "10.233.0.0/18"

k8s_pod_network_cidr

Optional

  • Type: String

  • Kubernetes pod network CIDR for internal network. When used, it will assign IP addresses from this range to individual pods.

  • This network must be unused in your network infrastructure.

  • Default value: "10.233.64.0/18"

csi_powerscale_driver_secret_file_path

Optional

  • Type: File path

  • If you want to deploy the CSI driver for PowerScale on your service cluster, add the file path of the secrets.yaml file to this variable.

csi_powerscale_driver_values_file_path

Optional

  • Type: File path

  • If you want to deploy the CSI driver for PowerScale on your service cluster, add the file path of the values.yaml file to this variable.

nfs_storage_name

Mandatory

  • Type: String

  • Use same name as mentioned in each of the nfs_name available in storage_config.yml.

k8s_crio_storage_size

Mandatory

  • Type: String

  • Specifies the disk size allocated for CRI-O container storage.

network_spec.yml

Network Name

Parameters for the network

Parameter details

admin_network

Note

This name cannot be modified. This is mandatory for discovery and provisioning of the cluster nodes.

oim_nic_name

string

Mandatory

The name of the NIC on which the administrative network is accessible to the OIM. Default value: eno1

netmask_bits

integer

Mandatory

The 32-bit “mask” used to divide an IP address into subnets and specify the network’s available hosts. Default value: 24

primary_oim_admin_ip

IP address

Mandatory

The admin IP address of the OIM server. Default value: 172.16.107.254

primary_oim_bmc_ip

IP address

Conditional mandatory

  • The iDRAC IP address of the OIM server

  • Mandatory only if idrac_telemetry is set to true and telemetry data needs to be collected.

  • This field can be omitted if iDRAC telemetry is not required.

dynamic_range

IP address range

Mandatory

The dynamic range of IPs to be provisioned on target nodes. Default value: 172.16.107.201-172.16.107.250

dns

IP address

Optional

External DNS server IP addresses for the admin network.

telemetry_config.yml

Parameter

Mandatory/Optional

Details

idrac_telemetry_support

Mandatory

  • Type: Boolean

  • If you want iDRAC telemetry support on your service cluster, set this variable to true before executing telemetry.yml and discovery.yml playbooks.

  • Accepted values: true or false

  • Default value: true

Note

If idrac_telemetry_support is set to true,``mysqldb_user``, mysqldb_password, and mysqldb_root_password parameters in the omnia_config_credentials.yml file becomes mandatory.

Note

If you want to deploy only Slurm clusters (slurm_custom), idrac_telemetry_support must be set to false.

idrac_telemetry_collection_type

Mandatory

  • Specify where to store iDRAC telemetry data.

  • Supported values:
    • victoria: Store in VictoriaMetrics only

    • Kafka: Store in Kafka only

    • victoria,kafka : Store in both (recommended)

  • Default: victoria,kafka

victoria_configurations > deployment_mode

Mandatory

  • Supported values:
    • single-node: Simple deployment (1 pod, suitable for dev/test)

    • cluster: High-availability deployment (7 pods, recommended for production)

  • Default: cluster

  • Cluster Mode Benefits:
    • High availability (no single point of failure)

    • Horizontal scalability (scale components independently)

    • Better performance (4x ingestion, 2x query speed)

    • Production-ready architecture

  • Single-Node Benefits:
    • Simple setup (fewer resources)

    • Suitable for small deployments (<10 nodes)

    • Lower resource usage (~4Gi memory vs ~10Gi for cluster)

victoria_configurations > persistence_size

Conditional Mandatory

  • The amount of storage allocated for each VictoriaMetrics persistent volume.

  • Important: Total VictoriaMetrics storage depends on deployment mode:
    • Single-node mode: Total storage = persistence_size * 1 pod

    • Cluster mode: Total storage = persistence_size * 3 vmstorage pods

  • Example (cluster): 8Gi *  3 = 24Gi total VictoriaMetrics storage

  • Accepted values: Must be specified in the form of X[Ki|Mi|Gi|Ti|Pi|Ei]

  • Default value: 8Gi (results in 24Gi total storage for cluster mode)

victoria_configurations > retention_period

Conditional Mandatory

  • Specify the number of hours to retain victoria logs before they are deleted.

  • Default: 168 (7 days)

kafka_configurations > persistence_size

Conditional Mandatory

  • The amount of storage allocated for each Kafka persistent volume.

  • Important: Total Kafka storage = persistence_size * 6 pods * 3 Kafka brokers (each gets persistence_size storage) * 3 Kafka controllers (each gets persistence_size storage)

  • Example: 8Gi * 6 = 48Gi total Kafka storage

  • Accepted values: Must be specified in the form of X[Ki|Mi|Gi|Ti|Pi|Ei]

  • Default value: 8Gi (results in 48Gi total storage)

  • The default 8Gi size is suitable for small clusters (typically fewer than 5 nodes). For larger clusters, you should increase the persistent size and adjust log_retention_hours and log_retention_bytes based on expected data volume and cluster size.

Caution

Ensure that the Kafka broker settings persistence_size, log_retention_hours, and log_retention_bytes are configured based on your data retention requirements. If the persistent volume reaches its capacity before logs are deleted according to the log retention period configured, Kafka brokers may run out of disk space. For more details on managing Kafka log retention and cleanup policies, see Managing Kafka logs with delete and compact policies.

kafka_configurations > log_retention_hours

Conditional Mandatory

  • Specify the number of hours to retain Kafka logs before they are deleted.

  • Default: 168 (7 days)

kafka_configurations > log_retention_bytes

Conditional Mandatory

  • Specify the maximum size of Kafka logs (in bytes) before they are deleted.

  • Default: -1 (unlimited)

kafka_configurations > log_segment_bytes

Conditional Mandatory

  • Specify the maximum size of Kafka log segments (in bytes) before they are deleted.

  • Default: 1073741824 (1 GB)

kafka_configurations > topic_partitions

Conditional Mandatory

  • Specify the partition counts for the following topics: * idrac * ldms * ome

  • Default partition counts: idrac=1, ldms=2, ome=1

  • Example:

    topic_partitions:

    name: “idrac”

    partitions: 1

    name: “ldms”

    partitions: 2

LDMS port configurations > ldms_agg_port

Conditional Mandatory

  • Specify the aggregator port to be used on the service k8s cluster.

  • Valid range: 6001-6100

  • Default: 6001

LDMS port configurations > ldms_store_port

Conditional Mandatory

  • Specify the store daemon port to be used on the service k8s cluster

  • The port can be the same as LDMS aggregator port specified for ldms_agg_port

  • Valid range: 6001-6100

  • Default: 6001

LDMS port configurations > ldms_sampler_port

Conditional Mandatory

  • Specify the sampler port to be used on the compute nodes.

  • Valid range: 10001-10100

  • Default: 10001

ldms_sampler_configurations > meminfo

Mandatory

  • Collects memory usage statistics (free, used, buffers, cached, etc.).

  • plugin_name: meminfo

  • config_parameters ""

  • activation_parameters:interval=1000000 indicates memory data metrics collected every 1 second.

  • The interval unit is microseconds

ldms_sampler_configurations > procstat2

Mandatory

  • Collects process statistics (CPU, memory, I/O per process).

  • plugin_name: procstat2

  • config_parameters ""

  • activation_parameters: interval=1000000

  • The interval unit is microseconds

ldms_sampler_configurations > vmstat

Mandatory

  • Collects virtual memory statistics (paging, swapping, memory pressure).

  • plugin_name: vmstat

  • config_parameters ""

  • activation_parameters: interval=1000000

  • The interval unit is microseconds

ldms_sampler_configurations > loadavg

Mandatory

  • Collects system load average (1, 5, and 15 minute averages).

  • plugin_name: loadavg

  • config_parameters: ""

  • activation_parameters interval=1000000

  • The interval unit is microseconds

ldms_sampler_configurations > procnetdev2

Mandatory

  • Collects network interface statistics (bytes, packets, errors, drops per interface)

  • The possible config parameters are:
    • ifaces=eth0,eth1: Specific interfaces to monitor

    • If not specified, all network interfaces will be monitored

  • plugin_name: procnetdev2

  • config_parameters: “”

  • activation_parameters: interval=1000000 offset=0

  • The interval unit is microseconds

Caution

  • All provided network ranges and NIC IP addresses should be distinct with no overlap in the /opt/omnia/input/project_default/network_spec.yml.

  • Ensure that all the iDRACs are reachable from the OIM.

A sample of the /opt/omnia/input/project_default/network_spec.yml where nodes are discovered using a mapping file is provided below:

Networks:
- admin_network:
   oim_nic_name: "eno1"
   netmask_bits: "24"
   primary_oim_admin_ip: "172.16.107.254"
   primary_oim_bmc_ip: ""
   dynamic_range: "172.16.107.201-172.16.107.250"
   dns: []

If you have any feedback about Omnia documentation, please reach out at omnia.readme@dell.com.