Add or Remove Slurm Nodes to the Cluster
Omnia supports addition and removal of Slurm compute nodes from an existing cluster.
Add Slurm Node to the Cluster
To add a new Slurm node to the cluster, follow these steps:
Update the PXE mapping file with new Slurm node entries. Add entries for new nodes with appropriate functional group assignments
slurm_node_x86_64.
Note
While updating the mapping file, ensure that the existing nodes are not removed from the mapping file.
Note
Addition of new slurm_control_node is not supported.
Run the
discovery.ymlplaybook to discover the new nodes. For more information, see Discover the Cluster Nodes.PXE boot the newly added nodes.
To enable telemetry collection using iDRAC telemetry service, run the
telemetry.ymlplaybook. For steps to initiate telemetry collection, see Step 15: Initialize and Verify Telemetry
Note
You do not need to run the telemetry.yml playbook if the service kubernetes cluster nodes are configured to collect telemetry data only using LDMS. By default, LDMS begins collection of data
after discovery.yml playbook is executed.
Remove Slurm nodes
To remove a Slurm node from the cluster, follow these steps:
Update the PXE mapping file. Remove or reassign nodes that should no longer be part of the Slurm cluster.
Run the
discovery.ymlplaybook.To stop telemetry collection using iDRAC telemetry service from the removed nodes, run the
telemetry.ymlplaybook.
Note
You do not need to run the telemetry.yml playbook to stop telemetry collection using LDMS from the removed nodes. By default, LDMS stops collection of data
after discovery.yml playbook is executed.
If you have any feedback about Omnia documentation, please reach out at omnia.readme@dell.com.