Configure Deployment Required for iDRAC Telemetry Service
To deploy iDRAC telemetry service on the service cluster, do the following:
Prerequisites
Redfish must be enabled in iDRAC.
If an internet connection is required on the service Kube node, configure it after the node is booted.
iDRAC firmware must be updated to the latest version.
Datacenter license must be installed on the nodes.
Ensure that the correct node service tags are displayed on the iDRAC interface. Otherwise, telemetry data cannot be collected by the
idrac_telemetry_receivercontainer.For telemetry collection on service cluster, all BMC (iDRAC) IPs must be reachable from the service cluster nodes.
Ensure that the
discovery.ymlplaybook has been executed successfully with bothservice_kube_control_plane_x86_64andservice_kube_node_x86_64in the mapping file, and thebmc_group_data.csvfile has been generated.Before running the
telemetry.ymlplaybook for the service cluster, ensure that all the service K8s compute nodes are reachable and booted and have been configured in the service K8s cluster.
Steps
In the mapping file, ensure that the service tag of the service kube node is specified as the parent for the slurm nodes.
Configure the
omnia_config.yml:omnia_config.yml Variables
Mandatory/Optional
Details
cluster_nameMandatory
Type: String
Name of the cluster on which you want to deploy Kubernetes.
This input is case-sensitive. Do not add any special characters except
_(underscore) in the cluster name.
deploymentMandatory
Type: Boolean
Indicates if Kubernetes will be deployed or not.
Accepted values:
trueorfalse
k8s_cniMandatory
Type: String
Kubernetes SDN network.
Accepted values:
calicoDefault value:
calico
pod_external_ip_rangeMandatory
Type: String
These addresses will be used by the loadbalancer for assigning external IPs to Kubernetes services.
Ensure that the IP range provided is not assigned to any node in the cluster.
Sample values:
172.16.107.170-172.16.107.200
k8s_service_addressesOptional
Type: String
Kubernetes internal network for services.
This network must be unused in your network infrastructure.
Default value:
"10.233.0.0/18"
k8s_pod_network_cidrOptional
Type: String
Kubernetes pod network CIDR for internal network. When used, it will assign IP addresses from this range to individual pods.
This network must be unused in your network infrastructure.
Default value:
"10.233.64.0/18"
csi_powerscale_driver_secret_file_pathOptional
Type: File path
If you want to deploy the CSI driver for PowerScale on your service cluster, add the file path of the
secrets.yamlfile to this variable.
csi_powerscale_driver_values_file_pathOptional
Type: File path
If you want to deploy the CSI driver for PowerScale on your service cluster, add the file path of the
values.yamlfile to this variable.
nfs_storage_nameMandatory
Type: String
Use same name as mentioned in each of the
nfs_nameavailable instorage_config.yml.
k8s_crio_storage_sizeMandatory
Type: String
Specifies the disk size allocated for CRI-O container storage.
Ensure that the
telemetry_config.ymlhas the entries specific to iDRAC telemetry support, Victoria, and Kafka based on your requirement.telemetry_config.yml Parameter
Mandatory/Optional
Details
idrac_telemetry_supportMandatory
Type: Boolean
If you want iDRAC telemetry support on your service cluster, set this variable to
truebefore executingtelemetry.ymlanddiscovery.ymlplaybooks.Accepted values:
trueorfalseDefault value:
true
Note
If
idrac_telemetry_supportis set totrue,``mysqldb_user``,mysqldb_password, andmysqldb_root_passwordparameters in theomnia_config_credentials.ymlfile becomes mandatory.Note
If you want to deploy only Slurm clusters (
slurm_custom),idrac_telemetry_supportmust be set tofalse.idrac_telemetry_collection_typeMandatory
Specify where to store iDRAC telemetry data.
- Supported values:
victoria: Store in VictoriaMetrics onlyKafka: Store in Kafka onlyvictoria,kafka: Store in both (recommended)
Default:
victoria,kafka
victoria_configurations >
deployment_modeMandatory
- Supported values:
single-node: Simple deployment (1 pod, suitable for dev/test)cluster: High-availability deployment (7 pods, recommended for production)
Default:
cluster- Cluster Mode Benefits:
High availability (no single point of failure)
Horizontal scalability (scale components independently)
Better performance (4x ingestion, 2x query speed)
Production-ready architecture
- Single-Node Benefits:
Simple setup (fewer resources)
Suitable for small deployments (<10 nodes)
Lower resource usage (~4Gi memory vs ~10Gi for cluster)
victoria_configurations >
persistence_sizeConditional Mandatory
The amount of storage allocated for each VictoriaMetrics persistent volume.
- Important: Total VictoriaMetrics storage depends on deployment mode:
Single-node mode: Total storage =persistence_size * 1 podCluster mode: Total storage =persistence_size * 3 vmstorage pods
Example (cluster):
8Gi * 3 = 24Gitotal VictoriaMetrics storageAccepted values: Must be specified in the form of
X[Ki|Mi|Gi|Ti|Pi|Ei]Default value:
8Gi(results in 24Gi total storage for cluster mode)
victoria_configurations >
retention_periodConditional Mandatory
Specify the number of hours to retain victoria logs before they are deleted.
Default: 168 (7 days)
kafka_configurations >
persistence_sizeConditional Mandatory
The amount of storage allocated for each Kafka persistent volume.
Important: Total Kafka storage =
persistence_size * 6 pods* 3 Kafka brokers (each getspersistence_sizestorage) * 3 Kafka controllers (each getspersistence_sizestorage)Example:
8Gi * 6 = 48Gitotal Kafka storageAccepted values: Must be specified in the form of
X[Ki|Mi|Gi|Ti|Pi|Ei]Default value:
8Gi(results in 48Gi total storage)The default
8Gisize is suitable for small clusters (typically fewer than 5 nodes). For larger clusters, you should increase thepersistent sizeand adjustlog_retention_hoursandlog_retention_bytesbased on expected data volume and cluster size.
Caution
Ensure that the Kafka broker settings
persistence_size,log_retention_hours, andlog_retention_bytesare configured based on your data retention requirements. If the persistent volume reaches its capacity before logs are deleted according to the log retention period configured, Kafka brokers may run out of disk space. For more details on managing Kafka log retention and cleanup policies, see Managing Kafka logs with delete and compact policies.kafka_configurations >
log_retention_hoursConditional Mandatory
Specify the number of hours to retain Kafka logs before they are deleted.
Default: 168 (7 days)
kafka_configurations >
log_retention_bytesConditional Mandatory
Specify the maximum size of Kafka logs (in bytes) before they are deleted.
Default: -1 (unlimited)
kafka_configurations >
log_segment_bytesConditional Mandatory
Specify the maximum size of Kafka log segments (in bytes) before they are deleted.
Default: 1073741824 (1 GB)
kafka_configurations >
topic_partitionsConditional Mandatory
Specify the partition counts for the following topics: * idrac * ldms * ome
Default partition counts:
idrac=1,ldms=2,ome=1Example:
topic_partitions:
name: “idrac”
partitions: 1
name: “ldms”
partitions: 2
LDMS port configurations >
ldms_agg_portConditional Mandatory
Specify the aggregator port to be used on the service k8s cluster.
Valid range: 6001-6100
Default: 6001
LDMS port configurations >
ldms_store_portConditional Mandatory
Specify the store daemon port to be used on the service k8s cluster
The port can be the same as LDMS aggregator port specified for
ldms_agg_portValid range: 6001-6100
Default: 6001
LDMS port configurations >
ldms_sampler_portConditional Mandatory
Specify the sampler port to be used on the compute nodes.
Valid range: 10001-10100
Default: 10001
ldms_sampler_configurations >
meminfoMandatory
Collects memory usage statistics (free, used, buffers, cached, etc.).
plugin_name:
meminfoconfig_parameters
""activation_parameters:
interval=1000000indicates memory data metrics collected every 1 second.The interval unit is microseconds
ldms_sampler_configurations >
procstat2Mandatory
Collects process statistics (CPU, memory, I/O per process).
plugin_name:
procstat2config_parameters
""activation_parameters:
interval=1000000The interval unit is microseconds
ldms_sampler_configurations >
vmstatMandatory
Collects virtual memory statistics (paging, swapping, memory pressure).
plugin_name:
vmstatconfig_parameters
""activation_parameters:
interval=1000000The interval unit is microseconds
ldms_sampler_configurations >
loadavgMandatory
Collects system load average (1, 5, and 15 minute averages).
plugin_name:
loadavgconfig_parameters:
""activation_parameters
interval=1000000The interval unit is microseconds
ldms_sampler_configurations >
procnetdev2Mandatory
Collects network interface statistics (bytes, packets, errors, drops per interface)
- The possible config parameters are:
ifaces=eth0,eth1: Specific interfaces to monitor
If not specified, all network interfaces will be monitored
plugin_name: procnetdev2
config_parameters: “”
activation_parameters:
interval=1000000 offset=0The interval unit is microseconds
If you have any feedback about Omnia documentation, please reach out at omnia.readme@dell.com.