Step 2: Create groups and assign functional roles to the nodes

In Omnia, nodes are organized based on their assigned groups and functional groups. By combining both groups and functional groups, Omnia offers a powerful and flexible approach to manage large-scale node infrastructures, ensuring both logical organization and physical optimization of resources.

  • A group is based on the physical characteristics of the nodes. It refers to nodes that are located in the same place or have similar hardware. For example, nodes in the same rack or SU (Scalable Unit) might be grouped together, with specific functional groups like Service Kube Node or Slurm Control Node. Groups help with physical organization and management of nodes.

  • A functional group defines what a node does in the system. It is a way to categorize nodes based on their functionality. Functional groups help group nodes that perform similar tasks, making it easier to manage and assign resources. For example, a node could belong to a functional group such as:

    • Service Kube Node

    • Login Node

    • Login Compiler Node

    • Slurm Control Node

    • Slurm Node

Both functional groups and groups must be configured in the functional_groups_config.yml input file. This file defines how nodes are organized in Omnia, including their functional roles and group assignments.

Create Groups

Nodes that are located in the same place or similar hardware can be grouped together. To do so, update the functional_groups_config.yml input file in the /opt/omnia/input/project_default directory which includes all necessary attributes for the nodes, based on their role within the cluster. Each group will have following attributes as indicated in the table below:

Group attributes

Attribute

Mandatory/Conditional mandatory/Optional

Description

Group Name - grpN

Mandatory

  • User defined name of the group.

  • Range for N is 0-99.

Example: grp0, grp1, and grp2.

Location of the node - location_id

Mandatory

  • Scalable unit and rack number range.

  • Range for <n> is 0-99.

  • Format: SU-<n>.RACK-<n>

Example: SU-1.RACK-1

Note

This attribute is case-sensitive. Ensure to use uppercase characters only.

Parent of the node- “parent’’

Conditional Mandatory

  • The list of service tags that are associated with active service node(s).

  • This field will be mandatory for group of nodes which is associated with slurm_node_x86_64 and slurm_node_aarch64 functional_groups.

  • This should be the service tag of the parent node.

Example: ABCD12

Create Functional groups

Nodes with similar functional roles or functionalities can be grouped together. The following table lists the functional groups available in Omnia.

Note

  • At least one functional group is mandatory, and you must not change the name of functional groups.

  • Each group name must be unique across all functional groups in the functional_groups_config.yml file.

  • The functional groups are case-sensitive in nature.

  • Omnia supports HA functionality for the service_cluster. For more information, click here.

  • To set up a service cluster, the service_kube_node must be present in the /opt/omnia/input/project_default/functional_groups_config.yml.

Types of Functional Groups

Functional Group Name

Layer

Details

Slurm control plane - slurm_control_node_x86_64

Management

  • Nodes with slurm_control_node functional group can be added to the Slurm head node groups.

  • This functional group is used to configure the nodes for Slurm head. The nodes included in this functional group will have the necessary tools and configurations to run Slurm head.

  • The nodes in this functional group can be used to run the Slurm head.

Example:

functional_groups:
  - name: "slurm_control_node_x86_64"
    cluster_name: "slurm_cluster"
    group:
      - grp0       

Slurm worker node - slurm_node_x86_64

Compute

  • This functional group is used to configure nodes as Slurm workers on the x86_64 architecture. The nodes included in this functional group will have the necessary tools and configurations to run Slurm workloads.

  • Nodes in this functional group can be used as Slurm worker nodes for x86_64 clusters.

Example:

functional_groups:
  - name: "slurm_node_x86_64"
    cluster_name: "slurm_cluster""       
    group:
      -grp1 

Slurm worker node - slurm_node_aarch64

Compute

  • This functional group is used to configure nodes as Slurm workers on the aarch64 architecture. The nodes included in this functional group will have the necessary tools and configurations to run Slurm workloads.

  • Nodes in this functional group can be used as Slurm worker nodes for aarch64 clusters.

Example:

functional_groups:
  - name: "slurm_node_aarch64"
    cluster_name: "slurm_cluster"       
    group:
      -grp2 

Service Cluster Kubernetes worker node - service_kube_node_x86_64

Management

    • service_kube_node functional group: Service Cluster Kubernetes worker node groups can be provided in this functional group.

  • This functional group is used to configure the Kubernetes worker nodes on service cluster.

  • The nodes included in this functional group will have the necessary tools and configurations to configure and run Kubernetes worker on service cluster.

Example:

functional_groups:
  - name: "service_kube_node_x86_64"
    cluster_name: "service_k8s_cluster"       
    group:
      -grp3   

Login node - login_node_x86_64

Management

  • This functional group is used to configure nodes for user logins on the x86_64 architecture. The nodes included in this functional group will have the necessary tools and configurations to support user login activities.

  • Nodes in this functional group can be used to handle user login sessions on x86_64 systems.

Example:

functional_groups:
  - name: "login_node_x86_64"
    cluster_name: "slurm_cluster"
    group:
      -grp4    

Login node - login_node_aarch64

Management

  • This functional group is used to configure nodes for user logins on the aarch64 architecture. The nodes included in this functional group will have the necessary tools and configurations to support user login activities.

  • Nodes in this functional group can be used to handle user login sessions on aarch64 systems.

Example:

functional_groups:
  - name: "login_node_aarch64"
    cluster_name: "slurm_cluster"       
    group:
      -grp5     

Login and Compiler node - login_compiler_node_x86_64

Management

  • This functional group is used to configure nodes for compilation on the x86_64 architecture. The nodes included in this functional group will have the necessary tools and configurations to perform compilation.

  • Nodes in this functional group can be used to compile code on x86_64 systems.

Example:

functional_groups:
  - name: "login_compiler_node_x86_64"
    cluster_name: "slurm_cluster"      
    group:
      -grp6    

Login and Compiler node- login_compiler_node_aarch64

Management

  • This functional group is used to configure nodes for compilation on the aarch64 architecture. The nodes included in this functional group will have the necessary tools and configurations to perform compilation.

  • Nodes in this functional group can be used to compile code on aarch64 systems.

Example:

functional_groups:
  - name: "login_compiler_node_aarch64"
    cluster_name: "slurm_cluster""      
    group:
      -grp7     

Sample

Here’s a sample (using mapping file) for your reference:

groups:
  grp0:
    location_id: SU-1.RACK-1
    parent: ""

  grp1:
    location_id: SU-1.RACK-1
    parent: ""

functional_groups:
  - name: "slurm_control_node_x86_64"
    cluster_name: "slurm_cluster"
    group:
      - grp0

If you have any feedback about Omnia documentation, please reach out at omnia.readme@dell.com.