Update the Input Parameters for Discovery/Provision the Nodes

Specify the required parameters in the following input files:

  • /opt/omnia/input/project_default/provision_config.yml

  • /opt/omnia/input/project_default/omnia_config_credentials.yml

  • /opt/omnia/input/project_default/software_config.json

  • /opt/omnia/input/project_default/storage_config.yml

  • /opt/omnia/input/project_default/omnia_config.yml

  • /opt/omnia/input/project_default/telemetry_config.yml

  • /opt/omnia/input/project_default/discovery_config.yml (for OME-based discovery)

Caution

Do not remove or comment any lines in the above mentioned .yml files.

provision_config.yml

Parameter

Details

pxe_mapping_file_path

string

Optional

  • Enter the path where user has placed the PXE mapping CSV file that contains the node details for provisioning.

  • The file must follow the format: FUNCTIONAL_GROUP_NAME, GROUP_NAME, SERVICE_TAG, PARENT_SERVICE_TAG, HOSTNAME, ADMIN_MAC, ADMIN_IP, BMC_MAC, BMC_IP, IB_NIC_NAME, IB_IP

  • This variable is required to discover nodes using a mapping file.

  • The headers of the .csv file are are FUNCTIONAL_GROUP_NAME, GROUP_NAME, SERVICE_TAG, PARENT_SERVICE_TAG, HOSTNAME, ADMIN_MAC, ADMIN_IP, BMC_MAC, BMC_IP, IB_NIC_NAME, IB_IP

  • A sample file is provided here: /omnia/examples/pxe_mapping_file.csv

timezone

string

Required

Timezone to be used during OS provisioning. Available timezones are provided here.

Choices:

  • GMT <- default

  • EST

  • CET

  • MST

  • CST6CDT

  • PST8PDT

language

string

Required

Language to be used during OS provisioning.

Default value: en_US.UTF-8

default_lease_time

integer

Required

Default lease time for IPs assigned by DHCP. Range: 21600-86400

Default value: 86400

Note

The /opt/omnia/input/project_default/omnia_config_credentials.yml file is encrypted on the first execution of the provision.yml or local_repo.yml playbooks.

  • To view the encrypted parameters:

    ansible-vault view omnia_config_credentials.yml --vault-password-file .omnia_config_credentials_key
    
  • To edit the encrypted parameters:

    ansible-vault edit omnia_config_credentials.yml --vault-password-file .omnia_config_credentials_key
    
  • If user is decrypting the file, then it must be encrypted again:

    ansible-vault encrypt omnia_config_credentials.yml --vault-password-file .omnia_config_credentials_key
    
software_config.json

Parameter

Mandatory/Optional

Details

cluster_os_type

Mandatory

  • Type: string

  • Specify the operating system running on the OIM and the one to be provisioned on the compute nodes.

  • Accepted value: rhel

cluster_os_version

Mandatory

  • Type: string

  • The OS Version that will be provisioned on compute nodes.

  • Accepted value: 10.0.

repo_config

Mandatory

  • Type: string

  • Omnia sets up a local Pulp repository on the OIM and downloads all the necessary packages or images for the cluster into this repository. The behavior of the Pulp container varies depending on the value of the repo_config parameter.

  • In case of always, packages are downloaded and cached on OIM during local_repo.yml execution; compute nodes get the packages from OIM. Caching these packages allows them to be reused in future operations without needing to download them again.

  • In case of partial, packages are not pre-downloaded or cached. OIM downloads from upstream URLs when needed.

  • Accepted value: always, partial

  • Default value: always

Note

The following packages will always be downloaded from the local Pulp repository, regardless of the value of repo_config:

  • ISOs

  • pip modules

  • manifests

  • tarballs

  • container images

softwares

Mandatory

  • Type: JSON list

  • A JSON list of required software with the software version (optional) and architecture type aarch64, x86_64 or both. This field is mandatory.

  • The following software should be listed with a version in the list: OpenLDAP, NFS, Slurm, service_k8s, utils, ucx, openmpi.

  • A minimum of one software should be provided in the list for local_repo.yml to execute correctly.

  • The software_config.json will have the basic softwares present in it. To install additional software stacks, update the additional_packages.json available at /opt/omnia/input/project_default/config/<architecture>/rhel/10.0/ and add the following entry to the JSON list: {"name": "additional_packages", "arch": ["x86_64, "aarch64"]}.

  • To install debug pakages on the cluster nodes, add the following entry to the JSON list: {"name": "admin_debug_packages", "arch": ["x86_64, "aarch64"]}.

  • For the list of all applicable softwares based on your <cluster_os_type>, see the templates at examples/template_<os>_software_config.json. For example, /omnia/examples/rhel_software_config.json

Note

The accepted names for software are taken from /opt/omnia/input/project_default/config/<architecture>/<cluster_os_type>/<cluster_os_version>.

slurm_custom

Mandatory

Specify the functional roles of the Slurm nodes, login and login compiler on which basic software packages must be deployed.

service_k8s

Mandatory

Specify the functional roles of the Service Kubernetes Cluster nodes on which basic software packages must be deployed.

additional_packages

Optional

Specify the functional roles of the cluster nodes on which additional software packages must be deployed.

storage_config.yml

Parameter

Details

mounts

List (dict)

Mandatory

Configure mount points compatible with cloud-init mounts module. Source must be known at boot time (NFS paths, UUIDs, local devices). For runtime-discovered sources (iSCSI/multipath), use powervault_config section.

Mandatory Fields:

  • name—Unique identifier for this mount entry. Pattern: [a-zA-Z0-9_-], length 1-64

  • source - Device or network path. Mandatory * For NFS: server_ip:/export/path (e.g., 192.168.1.100:/export/share, nfs-server.example.com:/home) * For local: /dev/sdc, UUID=xxx, LABEL=xxx * For CIFS: //server/share * Note: NFS paths must be resolvable at boot time (use IP or DNS-resolvable hostname)

  • mount_point - Absolute path for the mount point. Mandatory * Must be an absolute path starting with / (e.g., /home, /mnt/vast, /opt/data) * Avoid system directories (/etc, /sys, /proc, /boot, /root, /tmp) * Common patterns: /mnt/, /opt/, /home, /var/lib/* * Note: Path must be unique across all mount entries

Optional Fields:

  • fs_type—Filesystem type. Default: “auto”. Choices: auto, ext2, ext3, ext4, xfs, nfs, nfs4, cifs, tmpfs, cephfs, vfat, ntfs, none, fuse.s3fs. If specified, takes PRIORITY over mount_params profile

  • mnt_opts—Mount options string (e.g., defaults,noexec,nofail). If specified, takes PRIORITY over mount_params profile

  • dump_freq—Dump frequency. Default: “0”. Allowed values: “0”-“2”. If specified, takes PRIORITY over mount_params profile

  • fsck_pass—Fsck pass number. Default: “0”. Allowed values: “0”-“9”. If specified, takes PRIORITY over mount_params profile

  • mount_params—Name of a profile in mount_params section. Used ONLY for fields not explicitly specified in the mount entry

  • mount_on_oim—Boolean flag to mount on OIM (Omnia Image Management). Default: false. Ensure storage is network-accessible from OIM before enabling

  • permissions—Directory ownership and mode for the mount point (applied via chown + chmod after mount)
    • owner: User owner (name or numeric UID). Default: “root”

    • group: Group owner (name or numeric GID). Default: “root”

    • mode: Octal permission string (3-4 digits). Default: “0755”. Examples: “0755”, “1777”

Node-Specific Bind Mounts (paired parameters):

  • node_key—Per-node subdirectory isolation variable. Optional * Allowed values:

    • “local_hostname” - hostname of the node

    • “local_ipv4” - IPv4 address of the node

    • “instance_id” - instance ID of the node from cloud-init

    • Default: “local_hostname”

    • When set, node_mount_point is MANDATORY

    • Generates bind mounts: <mount_point>/<node_key_value>/<target> -> <target>

  • node_mount_point—List of bind mount target paths. Mandatory when node_key is set. Minimum 1 entry, values must be unique absolute paths. Pattern: <mount_point>/<node_key_value>/<target_stripped_slash> -> <target>

Node Targeting (exactly ONE is required - mutually exclusive):

  • functional_group_prefix—List of functional group name prefixes. All nodes whose group name starts with any listed prefix receive this mount. Example: [“slurm”] matches slurm_control_node, slurm_node, etc. MUTUALLY EXCLUSIVE with groups

  • groups—List of GROUP_NAME values from pxe_mapping_file.csv. Only nodes assigned to the listed PXE groups receive this mount. Example: [“grp1”, “grp2”] targets only nodes in those groups. MUTUALLY EXCLUSIVE with functional_group_prefix

mount_params

dict

Optional

Named default profiles for mount configurations. Profiles are referenced by name from mount entries via the mount_params field.

Mandatory Profile Fields:

  • fs_type—Default filesystem type. Choices: auto, ext2, ext3, ext4, xfs, nfs, nfs4, cifs, tmpfs, cephfs, vfat, ntfs, none, fuse.s3fs

  • mnt_opts—Default mount options string (comma-separated)

Optional Profile Fields:

  • dump_freq—Default dump frequency. Allowed values: “0”-“2”

  • fsck_pass—Default fsck pass number. Allowed values: “0”-“9”

Predefined Profiles:

  • nfs_default—Default NFS mount
    • fs_type: “nfs”

    • mnt_opts: “nosuid,rw,sync,hard”

    • dump_freq: “0”

    • fsck_pass: “0”

  • vast_rdma—VAST NFS with RDMA configuration over InfiniBand
    • fs_type: “nfs”

    • mnt_opts: “proto=rdma,nconnect=8,timeo=600,retrans=2,rsize=1048576,wsize=1048576,hard”

  • vast_tcp—VAST NFS storage with standard TCP configuration
    • fs_type: “nfs”

    • mnt_opts: “nosuid,rw,sync,hard”

powervault_config

List (dict)

Optional

PowerVault iSCSI storage configuration. Processed entirely via runcmd script (setup_iscsi_storage.sh). The device path (/dev/mapper/XXX) is only known after iSCSI login + multipath scan, so powervault mounts CANNOT use the cloud-init mounts module. The runcmd script handles: iscsid enable, initiator name, discovery, login, multipathd, volume_id matching, partitioning, formatting, mount, and bind mounts. NOTE: The groups field is NOT supported for powervault_config entries. Only functional_group_prefix is available for node targeting.

Mandatory Fields:

  • name—Unique identifier for this PowerVault entry. Pattern: [a-zA-Z0-9_-], length 1-64

  • ip—List of PowerVault controller IPv4 addresses for iSCSI discovery. Minimum 1 address, values must be unique

  • iscsi_initiator—iSCSI initiator IQN for the host. Pattern: iqn.<date>.<domain>:<identifier>

  • volume_id—Volume WWN/identifier for multipath device matching. Pattern: hex string [a-fA-F0-9]+

  • mount_point—Absolute path where the discovered device gets mounted

  • functional_group_prefix—List of oChaMI functional group name prefixes. All nodes whose group name starts with any listed prefix receive this entry

Optional Fields:

  • port—TCP port for iSCSI target service. Default: 3260. Range: 1-65535

  • fs_type—Filesystem type. Default: “xfs”. Choices: xfs, ext4, ext3, ext2, nfs, nfs4, cifs, ntfs, auto. If specified, takes PRIORITY over mount_params profile

  • mnt_opts—Mount options string. If specified, takes PRIORITY over mount_params profile

  • dump_freq—Dump frequency. Default: “0”. Allowed values: “0”-“2”

  • fsck_pass—Fsck pass number. Default: “0”. Allowed values: “0”-“9”

  • mount_params—Named profile key from mount_params section

Node-Specific Bind Mounts (paired parameters):

  • node_key—Per-node subdirectory isolation variable. Choices: “local_hostname”, “local_ipv4”, “instance_id”. Default: “local_hostname”. When set, node_mount_point is MANDATORY. When node_key is specified, fs_type is forced to “none” and mnt_opts is forced to “bind” regardless of user input

  • node_mount_point—List of bind mount target paths. Mandatory when node_key is set. Pattern: <mount_point>/<node_key_value>/<target_stripped_slash> -> <target>

Permissions (optional sub-object, applied via chown + chmod after mount):

  • permissions.owner—User owner (name or UID). Default: “root”

  • permissions.group—Group owner (name or GID). Default: “root”

  • permissions.mode—Octal permission string (3-4 digits). Default: “0755”

swap

List (dict)

Optional

Swap file configuration for cluster nodes. Swap files are created and enabled during node provisioning. NOTE: The groups field is NOT supported for swap entries. Only functional_group_prefix is available for node targeting.

Mandatory Fields:

  • filename—Path to the swap file to create. Pattern: /path/to/swapfile (absolute path)

  • size—Swap file size. Values: “auto”, a byte integer, or human-readable format (e.g., “2G”, “512M”)

  • functional_group_prefix—List of oChaMI functional group name prefixes. All nodes whose group name starts with any listed prefix receive this swap

Optional Fields:

  • maxsize—Maximum swap size (used only when size is “auto”). Format: byte integer or human-readable (e.g., “4G”)

s3_configurations

dict

Optional

Configures the S3-compatible storage backend for OpenCHAMI image repository.

Mandatory Fields:

  • provider—Selects which S3-compatible storage service to use * Choices: “powerscale” (Dell PowerScale as external S3 storage), “minio” (MinIO container deployed locally on OIM) * Default: “powerscale”

Optional Fields:

  • endpoint_url—S3 endpoint URL * Required when provider is “powerscale” (e.g., “https://10.43.1.11:9021”) * Leave empty (“”) when provider is “minio” (auto-configured to local MinIO) * Default: “” (empty)

Credentials:

  • s3_access_id—S3 access key ID * For “minio” provider: defaults to “admin” (not prompted) * For “powerscale” provider: prompted as conditional mandatory during prepare_oim credential setup

  • s3_secret_key—S3 secret access key * Prompted during prepare_oim credential setup for both providers

Note

When PowerScale is configured as an NFS server, ensure that the following CSI-PowerScale driver entry is present in the software_config.json file:

{"name": "csi_driver_powerscale", "version": "v2.15.0", "arch": ["x86_64"]}

For more information on deploying the Dell CSI-PowerScale driver, see Deploy CSI drivers for Dell PowerScale Storage Solutions.

The following table lists the parameters that must be configured in omnia_config.yml for slurm cluster.

omnia_config.yml

Variables

Details

cluster_name

string

Mandatory

  • Indicates the name of the cluster.

nfs_storage_name

string

Mandatory

  • Indicates the NFS storage name for the NFS storage to be used by this slurm cluster.

  • This is defined in storage_config.yml as name.

vast_storage_name

string

Optional

  • Storage name corresponding to the VAST storage to be used by slurm cluster.

  • This should match exactly with an entry in storage_config.yml.

  • The following directories will be mounted on the VAST storage:

    • /scratch

    • /tmp

    • /home

    • /apps

    • /projects

config_sources

filepath or mapping

Optional

  • Indicates how the slurm configuration values are provided to the cluster.

  • <conf name> : <filepath> or <mapping>

    • The conf files supported by slurm are slurm, cgroup, gres, mpi, helpers, job_container, acct_gather, oci, and topology.

    • <filepath>: Supply the absolute path to a custom configuration file.

    • <mapping>: Supply the configuration values directly as a key–value map

skip_merge

boolean

Optional

  • Indicates whether a specific configuration file path under config_sources should be used without merging.

  • If skip_merge is set to true for a configuration source path, that configuration file is applied directly without merging with defaults or existing configurations.

The following table lists the parameters that must be configured in omnia_config.yml for service Kubernetes cluster.

omnia_config.yml

Variables

Mandatory/Optional

Details

cluster_name

Mandatory

  • Type: String

  • Name of the cluster on which you want to deploy Kubernetes.

  • This input is case-sensitive. Do not add any special characters except _ (underscore) in the cluster name.

deployment

Mandatory

  • Type: Boolean

  • Indicates if Kubernetes will be deployed or not.

  • Accepted values: true or false

k8s_cni

Mandatory

  • Type: String

  • Kubernetes SDN network.

  • Accepted values: calico

  • Default value: calico

pod_external_ip_range

Mandatory

  • Type: String

  • These addresses will be used by the loadbalancer for assigning external IPs to Kubernetes services.

  • Ensure that the IP range provided is not assigned to any node in the cluster.

  • Ensure that the pod_external_ip_range defined in the omnia_config.yml file is reachable from the OpenManage Enterprise appliance and the SFM network.

  • Sample values: 172.16.107.170-172.16.107.200

k8s_service_addresses

Optional

  • Type: String

  • Kubernetes internal network for services.

  • This network must be unused in your network infrastructure.

  • Default value: "10.233.0.0/18"

k8s_pod_network_cidr

Optional

  • Type: String

  • Kubernetes pod network CIDR for internal network. When used, it will assign IP addresses from this range to individual pods.

  • This network must be unused in your network infrastructure.

  • Default value: "10.233.64.0/18"

csi_powerscale_driver_secret_file_path

Optional

  • Type: File path

  • If you want to deploy the CSI driver for PowerScale on your service cluster, add the file path of the secrets.yaml file to this variable.

csi_powerscale_driver_values_file_path

Optional

  • Type: File path

  • If you want to deploy the CSI driver for PowerScale on your service cluster, add the file path of the values.yaml file to this variable.

nfs_storage_name

Mandatory

  • Type: String

  • Use same name as mentioned in each of the name available in storage_config.yml.

k8s_crio_storage_size

Mandatory

  • Type: String

  • Specifies the disk size allocated for CRI-O container storage.

etcd_on_local_disk

Optional

  • Type: Boolean

  • Determines whether ETCD is deployed on local disk or NFS storage.

  • Accepted values: true or false

  • Default value: false

  • When set to true, ETCD is deployed on local disk on all master nodes. The system prioritizes BOSS card if available, and falls back to SSD/SATA disks if BOSS is not present. The /var/lib/etcd directory is mounted on the selected local disk.

  • When set to false or omitted, ETCD storage is provisioned using NFS, and no local disk configuration is performed for ETCD.

  • Important: Migration from NFS to local disk is not supported during upgrades. This configuration is only applicable for fresh installations.

network_spec.yml

Network Name

Parameters for the network

Parameter details

admin_network

oim_nic_name

string

Mandatory

The name of the interface on the OIM server associated with the admin network. Default value: eno1

subnet

IP address

Mandatory

The subnet address for the admin network. Default value: 172.16.0.0

netmask_bits

integer

Mandatory

The number of bits in the subnet mask. Default value: 24

primary_oim_admin_ip

IP address

Mandatory

The admin IP address of the OIM server which is configured. Default value: 172.16.107.254

primary_oim_bmc_ip

IP address

Optional

The iDRAC IP address of the OIM server. Mandatory only if idrac_telemetry is set to true and telemetry data needs to be collected from the OIM server. Optional - can be omitted if iDRAC telemetry for the OIM server is not required. Default value: "

dynamic_range

IP address range

Mandatory

The range of dynamic IP addresses available on the admin network. Default value: 172.16.107.201-172.16.107.250

dns

array of IP addresses

Optional

The list of external DNS server IP addresses for the admin network. Default value: []

ntp_servers

array of NTP server objects

Optional

The list of NTP servers for the admin network. Each NTP server entry should include address (IP address or hostname) and type (server or pool). Default value: []

additional_subnets

array of subnet objects

Optional

Optional field for multi-RAC/multi-subnet PXE deployments. Each entry defines a separate subnet that the CoreDHCP server will manage via DHCP relay (giaddr-based routing). Requires coresmd v0.5+ with multi-subnet support. Leave empty array ([]) for single-subnet deployments. Default value: []

Each additional subnet entry contains the following parameters:

  • subnet - The network address of the additional subnet (e.g. “10.40.1.0”)

  • netmask_bits - The CIDR prefix length (e.g. “24”)

  • router - The gateway/router IP for this subnet (used as DHCP option 3)

  • dynamic_range - The DHCP IP pool range in “start_ip-end_ip” format. Must fall within the subnet.

See documentation for example configuration.

ib_network

subnet

IP address

Mandatory

The subnet of the IB network. Default value: 192.168.0.0

netmask_bits

integer

Mandatory

The number of bits in the subnet mask. This value must be same as the admin_network netmask_bits. Default value: 24

dns

array of IP addresses

Optional

External DNS server IP addresses for the InfiniBand network. Default value: []

telemetry_config.yml

Parameter

Mandatory/Optional

Details

telemetry_sources > idrac > metrics_enabled

Mandatory

  • Type: Boolean

  • Enable or disable iDRAC metrics collection from Dell PowerEdge servers

  • Collected metrics: temperature, power, fan speed, storage health, CPU/memory errors

  • Data path:
    • iDRAC Receiver -> ActiveMQ -> KafkaPump -> Kafka ‘idrac’ topic

    • iDRAC Receiver -> ActiveMQ -> VictoriaPump -> vmagent -> victoria_metrics

  • Accepted values: true or false

  • Default value: true

Note

If iDRAC telemetry is enabled, mysqldb_user, mysqldb_password, and mysqldb_root_password parameters in the omnia_config_credentials.yml file become mandatory.

Note

If you want to deploy only Slurm clusters (slurm_custom), set metrics_enabled to false.

telemetry_sources > idrac > collection_targets

Mandatory

  • Collection targets define where iDRAC data is sent before Vector processing

  • Supported values: victoria_metrics, kafka

  • Multiple targets: Can specify both [victoria_metrics, kafka]

  • Default: [victoria_metrics, kafka]

idrac_telemetry_configurations > mysqldb_storage

Conditional Mandatory

  • MySQL database storage for iDRAC telemetry

  • Purpose: Storage configuration for iDRAC telemetry MySQL database

  • Accepted values: Must be specified in the form of X[Ki|Mi|Gi|Ti|Pi|Ei]

  • Default value: 1Gi

  • Required when: telemetry_sources > idrac > metrics_enabled is true

telemetry_sources > ldms > metrics_enabled

Mandatory

  • Type: Boolean

  • Enable or disable LDMS metrics collection from compute nodes

  • Collected metrics: CPU, memory, network, disk metrics

  • Data path: LDMS samplers → LDMS aggregator → store_avro_kafka → Kafka ‘ldms’ topic

  • Accepted values: true or false

  • Default value: true

telemetry_sources > ldms > collection_targets

Mandatory

  • LDMS only supports Kafka collection (no direct victoria_metrics path)

  • Vector-LDMS bridge consumes from Kafka and routes to victoria_metrics

  • Supported values: kafka

  • Default: [kafka]

telemetry_sources > dcgm > metrics_enabled

Optional

  • Type: Boolean

  • Enable or disable DCGM (NVIDIA Data Center GPU Manager) metrics collection

  • Collected metrics: GPU temperature, utilization, memory, ECC errors, power

  • Requires: NVIDIA GPU driver installed on compute nodes

  • Accepted values: true or false

  • Default value: true

telemetry_sources > powerscale > metrics_enabled

Optional

  • Type: Boolean

  • Enable or disable PowerScale metrics collection from Dell PowerScale (OneFS) storage

  • Collected metrics: Storage metrics from Dell PowerScale clusters

  • Requires: CSM Observability (Karavi) values file configured

  • Data path: CSM Metrics PowerScale → OTEL Collector → vmagent(shared) → victoria_metrics

  • Accepted values: true or false

  • Default value: true

telemetry_sources > powerscale > logs_enabled

Optional

  • Type: Boolean

  • Enable or disable PowerScale logs collection

  • Accepted values: true or false

  • Default value: true

telemetry_sources > powerscale > collection_targets

Conditional Mandatory

  • PowerScale uses dedicated vmagent(shared) (no Kafka, no Vector)

  • Supported values: victoria_metrics, victoria_logs

  • Default: [victoria_metrics, victoria_logs]

telemetry_sources > ufm > metrics_enabled

Optional

  • Type: Boolean

  • Enable or disable UFM (NVIDIA UFM InfiniBand Fabric Manager) metrics collection

  • Collected metrics: IB port state, transmit/receive data, error counters, fabric topology

  • Requires: NVIDIA UFM appliance with Prometheus exporter enabled

  • Data path: UFM Prometheus Exporter → vmagent(shared) → victoria_metrics

  • Accepted values: true or false

  • Default value: false

telemetry_sources > ufm > logs_enabled

Optional

  • Type: Boolean

  • Enable or disable UFM syslog logs collection

  • Accepted values: true or false

  • Default value: false

telemetry_sources > ufm > collection_targets

Conditional Mandatory

  • UFM uses vmagent(shared) for metrics and VLAgent for logs

  • Supported values: victoria_metrics, victoria_logs

  • Default: [victoria_metrics, victoria_logs]

telemetry_sources > vast > metrics_enabled

Optional

  • Type: Boolean

  • Enable or disable VAST (Data Storage) metrics collection

  • Collected metrics: IB port state, transmit/receive data, error counters, fabric topology

  • Requires: VAST appliance with Prometheus exporter enabled

  • Data path: Prometheus Exporter → vmagent(shared) → victoria_metrics

  • Accepted values: true or false

  • Default value: false

telemetry_sources > vast > logs_enabled

Optional

  • Type: Boolean

  • Enable or disable VAST syslog logs collection

  • Accepted values: true or false

  • Default value: false

telemetry_sources > vast > collection_targets

Conditional Mandatory

  • VAST uses vmagent(shared) for metrics and VLAgent for logs

  • Supported values: victoria_metrics, victoria_logs

  • Default: [victoria_metrics, victoria_logs]

telemetry_bridges > vector_ldms > metrics_enabled

Optional

  • Type: Boolean

  • Enable or disable Vector-LDMS bridge (Kafka-to-victoria_metrics bridge for LDMS metrics)

  • Purpose: Consume LDMS metrics from Kafka ‘ldms’ topic, transform NERSC schema to Prometheus format, and write to victoria_metrics

  • Data flow: Kafka ‘ldms’ topic → Vector-LDMS → vmagent-vector → victoria_metrics

  • Requires: telemetry_sources > ldms > metrics_enabled = true

  • Accepted values: true or false

  • Default value: true

telemetry_bridges > vector_ome > metrics_enabled

Optional

  • Type: Boolean

  • Enable or disable Vector-OME metrics routing (Kafka-to-Victoria bridge for OME metrics)

  • Data flow: Kafka ‘ome.*’ topics → Vector-OME → vmagent-vector (metrics)

  • Requires: OME to be configured with kafka

  • Accepted values: true or false

  • Default value: true

telemetry_bridges > vector_ome > logs_enabled

Optional

  • Type: Boolean

  • Enable or disable Vector-OME logs routing

  • Data flow: Kafka ‘ome.*’ topics → Vector-OME → vlagent-vector (logs)

  • Accepted values: true or false

  • Default value: true

telemetry_bridges > vector_ome > ome_identifier

Optional

  • Identifier used by Vector-OME for topic identification and routing

  • Internally used to match topics with the prefix (e.g., ^ome\\..*$)

  • Type: String

  • minLength: 1

  • Default value: ome

  • Note: Change only if your OME Kafka topics use a different prefix

telemetry_sinks > victoria_metrics > persistence_size

Conditional Mandatory

  • Storage per vmstorage pod PVC

  • Important: Total VictoriaMetrics storage depends on deployment mode:
    • Single-node mode: Total storage = persistence_size * 1 pod

    • Cluster mode: Total storage = persistence_size * 3 vmstorage pods

  • Example (cluster): 8Gi * 3 = 24Gi total VictoriaMetrics storage

  • Accepted values: Must be specified in the form of X[Ki|Mi|Gi|Ti|Pi|Ei]

  • Default value: 8Gi (results in 24Gi total storage for cluster mode)

telemetry_sinks > victoria_metrics > retention_period

Conditional Mandatory

  • Metric retention period in hours

  • Default: 168 (7 days)

telemetry_sinks > victoria_metrics > additional_metric_remote_write_endpoints

Optional

  • Additional remote write endpoints for metrics (optional)

  • Purpose: Send metrics to external VictoriaMetrics instances in addition to Omnia-managed VictoriaMetrics

  • Format: List of endpoint objects with ‘url’ field (must start with http:// or https://)

  • TLS: Set ‘tls_insecure_skip_verify: true’ to skip TLS certificate verification

  • Default: [] (empty — only Omnia VictoriaMetrics receives metrics)

  • Example: - url: https://external-metrics-server:8480/insert/0/prometheus/api/v1/write

    tls_insecure_skip_verify: false

telemetry_sinks > victoria_logs > storage_size

Conditional Mandatory

  • Storage per vlstorage pod PVC

  • Total storage = storage_size × 3 vlstorage pods

  • Accepted values: Must be specified in the form of X[Ki|Mi|Gi|Ti|Pi|Ei]

  • Default value: 8Gi (results in 24Gi total storage)

  • Sizing formula: (140 MB/day × retention_days × node_count) / 3 replicas

Warning

Storage under-provisioning can lead to data loss before the retention period is reached. Calculate storage requirements based on expected log volume and retention needs.

telemetry_sinks > victoria_logs > retention_period

Conditional Mandatory

  • Log retention period in hours

  • Type: Integer (hours)

  • Accepted values: 24-8760 (1 day to 1 year)

  • Default: 168 (7 days)

  • Note: Retention is global and applies to all log streams uniformly. Deletion occurs asynchronously during background merge operations.

Note

VictoriaLogs does not return an error when log entries with timestamps outside the configured retention window are submitted. Log entries will be automatically removed from VictoriaLogs after the retention period.

telemetry_sinks > victoria_logs > additional_log_write_endpoints

Optional

  • Additional remote write endpoints for logs (optional)

  • Purpose: Send logs to external VictoriaLogs instances in addition to Omnia-managed VictoriaLogs

  • Format: List of endpoint objects with ‘url’ field (must start with http:// or https://)

  • TLS: Set ‘tls_insecure_skip_verify: true’ to skip TLS certificate verification

  • Default: [] (empty — only Omnia VictoriaLogs receives logs)

  • Example: - url: https://external-logs-server:9481/internal/insert

    tls_insecure_skip_verify: false

telemetry_sinks > kafka > persistence_size

Conditional Mandatory

  • Storage per Kafka pod PVC

  • Total = persistence_size × 6 pods (3 brokers + 3 controllers)

  • Accepted values: Must be specified in the form of X[Ki|Mi|Gi|Ti|Pi|Ei]

  • Default value: 8Gi (results in 48Gi total storage)

  • The default 8Gi size is suitable for small clusters (typically fewer than 5 nodes). For larger clusters, you should increase the persistence_size and adjust log_retention_hours and log_retention_bytes based on expected data volume and cluster size.

Caution

Ensure that the Kafka broker settings persistence_size, log_retention_hours, and log_retention_bytes are configured based on your data retention requirements. If the persistent volume reaches its capacity before logs are deleted according to the log retention period configured, Kafka brokers may run out of disk space. For more details on managing Kafka log retention and cleanup policies, see Managing Kafka logs with delete and compact policies.

telemetry_sinks > kafka > log_retention_hours

Conditional Mandatory

  • Log retention period in hours

  • Default: 168 (7 days)

telemetry_sinks > kafka > log_retention_bytes

Conditional Mandatory

  • Maximum size of Kafka logs (in bytes) before deletion

  • Default: -1 (unlimited)

telemetry_sinks > kafka > log_segment_bytes

Conditional Mandatory

  • Maximum size of Kafka log segments (in bytes)

  • Default: 1073741824 (1 GB)

telemetry_sinks > kafka > topic_partitions

Conditional Mandatory

  • Topic partitions per source (object format, not array)

  • Format: {topic_name: partition_count}

  • Required when: Source has kafka in collection_targets

  • Allowed topics: idrac, ldms only

  • Default partition counts: idrac=1, ldms=2

  • Example: {idrac: 1, ldms: 2}

ldms_configurations > agg_port

Conditional Mandatory

  • Aggregator port on service K8s cluster

  • Valid range: 6001-6100

  • Default: 6001

ldms_configurations > store_port

Conditional Mandatory

  • Store daemon port on service K8s cluster

  • The port can be the same as LDMS aggregator port

  • Valid range: 6001-6100

  • Default: 6001

ldms_configurations > sampler_port

Conditional Mandatory

  • Sampler port on compute nodes

  • Valid range: 10001-10100

  • Default: 10001

ldms_configurations > sampler_plugins

Mandatory

  • Sampler plugins define which metrics to collect from compute nodes

  • Parameters:
    • plugin_name: Name of the LDMS sampler plugin

    • config_parameters: Plugin-specific configuration (as a single string)

    • activation_parameters: Collection schedule in MICROSECONDS

      Format: interval=<microseconds> offset=<microseconds> Example: interval=30000000 (30 seconds)

  • Default plugins:
    • meminfo: Memory usage statistics (free, used, buffers, cached)

    • procstat2: Process statistics (CPU, memory, I/O per process)

    • vmstat: Virtual memory statistics (paging, swapping, memory pressure)

    • loadavg: System load average (1, 5, and 15 minute averages)

    • procnetdev2: Network interface statistics (bytes, packets, errors, drops per interface)

  • Default activation_parameters: interval=30000000 (30 seconds for all plugins except procnetdev2 which includes offset=0)

powerscale_configurations > otel_collector_storage_size

Conditional Mandatory

  • PVC size for OTEL Collector metric batching and buffering

  • Accepted values: Must be specified in the form of X[Ki|Mi|Gi|Ti|Pi|Ei]

  • Default value: 5Gi

powerscale_configurations > csm_observability_values_file_path

Conditional Mandatory

Note

In the values.yaml file, only set karaviMetricsPowerscale -> enabled: true. Set the following parameters to false: karaviMetricsPowerflex -> enabled=false, karaviMetricsPowerstore -> enabled=false, karaviMetricsPowerscaleauthorization.-> enabled=false, karaviMetricsPowermax -> enabled=false.

Note

Update the isiAuthType in the values.yaml file based on the current auth type setting. To check the current auth type setting, use the command isi http settings view.

Note

For CSI PowerScale health metrics, enable controller > healthMonitor > enabled: true and node > healthMonitor > enabled: true in the CSI PowerScale values.yaml (https://raw.githubusercontent.com/dell/helm-charts/csi-isilon-2.15.0/charts/csi-isilon/values.yaml).

ufm_configuration > ufm_endpoint

Conditional Mandatory

  • UFM appliance IP address or hostname

  • Required when: telemetry_sources > ufm > metrics_enabled is true

  • Example: 172.20.44.180 or ufm.example.com

  • Default value: "

ufm_configuration > ufm_metrics_port

Optional

  • UFM Prometheus exporter port

  • Default value: 9001 (UFM default Prometheus port)

ufm_configuration > scrape_interval

Optional

  • Prometheus scrape interval for UFM metrics

  • Accepted values: Prometheus duration format (e.g., 15s, 30s, 1m)

  • Default value: 30s

ufm_configuration > scrape_timeout

Optional

  • Prometheus scrape timeout (must be <= scrape_interval)

  • Accepted values: Prometheus duration format (e.g., 10s, 15s)

  • Default value: 15s

ufm_configuration > tls_mode

Optional

  • TLS mode for connecting to UFM Prometheus endpoint

  • Accepted values: self_signed, ca_signed

  • self_signed: Skip TLS verification (insecure_skip_verify=true)

  • ca_signed: Use CA certificate for TLS verification

  • Default value: self_signed

ufm_configuration > ufm_ca_cert_path

Optional

  • Path to CA certificate file for UFM TLS verification

  • Required when: tls_mode is ca_signed

  • Must be a valid PEM-format certificate file

  • Default value: " (empty — not used when tls_mode is self_signed)

ufm_configuration > auth_mode

Optional

  • Authentication mode for UFM Prometheus endpoint

  • Accepted values: basic, none

  • basic: Use ufm_username/ufm_password from omnia_config_credentials.yml

  • none: No authentication (UFM endpoint is open)

  • Default value: basic

vast_configuration > vast_endpoint

Conditional Mandatory

  • VAST appliance IP address or hostname

  • Required when: telemetry_sources > vast > metrics_enabled is true

  • Example: 172.20.44.180 or vast.example.com

  • Default value: "

vast_configuration > vast_metrics_port

Optional

  • VAST Prometheus exporter port

  • Default value: 9001 (VAST default Prometheus port)

vast_configuration > scrape_interval

Optional

  • Prometheus scrape interval for VAST metrics

  • Accepted values: Prometheus duration format (e.g., 15s, 30s, 1m)

  • Default value: 30s

vast_configuration > scrape_timeout

Optional

  • Prometheus scrape timeout (must be <= scrape_interval)

  • Accepted values: Prometheus duration format (e.g., 10s, 15s)

  • Default value: 15s

vast_configuration > tls_mode

Optional

  • TLS mode for connecting to VAST Prometheus endpoint

  • Accepted values: self_signed, ca_signed

  • self_signed: Skip TLS verification (insecure_skip_verify=true)

  • ca_signed: Use CA certificate for TLS verification

  • Default value: self_signed

vast_configuration > vast_ca_cert_path

Optional

  • Path to CA certificate file for VAST TLS verification

  • Required when: tls_mode is ca_signed

  • Must be a valid PEM-format certificate file

  • Default value: " (empty — not used when tls_mode is self_signed)

vast_configuration > auth_mode

Optional

  • Authentication mode for VAST Prometheus endpoint

  • Accepted values: basic, none

  • basic: Use vast_username/vast_password from omnia_config_credentials.yml

  • none: No authentication (VAST endpoint is open)

  • Default value: basic

discovery_config.yml

Parameter

Mandatory/Optional

Details

enable_bmc_discovery

Optional

  • Type: Boolean

  • Set to true to enable BMC discovery via OME. When false, OME credentials will not be prompted during prepare_oim.

  • Accepted values: true or false

  • Default value: false

ome_ip

Conditional Mandatory

  • Type: String

  • IP address of the Dell OpenManage Enterprise (OME) instance used for server discovery and inventory collection.

  • Required when: enable_bmc_discovery is set to true.

  • Example: "192.168.1.100"

  • Default value: “” (empty string)

Note

The discovery_config.yml file is required for OME-based BMC discovery. It contains the OpenManage Enterprise connection details and discovery parameters.

Caution

  • All provided network ranges and NIC IP addresses should be distinct with no overlap in the /opt/omnia/input/project_default/network_spec.yml.

  • Ensure that all the iDRACs are reachable from the OIM.

  • For OME-based discovery, ensure that OME can access the BMC/iDRAC interfaces of all target servers.

A sample of the /opt/omnia/input/project_default/network_spec.yml is provided below. This configuration is used for both OME-based discovery and mapping file discovery:

Networks:
- admin_network:
   oim_nic_name: "eno1"
   netmask_bits: "24"
   primary_oim_admin_ip: "172.16.107.254"
   primary_oim_bmc_ip: ""
   dynamic_range: "172.16.107.201-172.16.107.250"
   dns: []

If you have any feedback about Omnia documentation, please reach out at omnia.readme@dell.com.