Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Quick-start for Metal3

This guide has been tested on Ubuntu server 24.04. It should be seen as an example rather than the absolute truth about how to deploy and use Metal3. We will cover two environments and two scenarios. The environments are

  1. a baremetal lab with actual physical servers and baseboard management controllers (BMCs), and
  2. a virtualized baremetal lab with virtual machines and sushy-tools acting as BMC.

In both of these, we will show how to use Bare Metal Operator and Ironic to manage the servers through a Kubernetes API, as well as how to turn the servers into Kubernetes clusters managed through Cluster API. These are the two scenarios.

In a nut-shell, this is what we will do:

  1. (Optional) Setup virtualized lab environment
  2. Setup a disk image server
  3. Setup a management cluster
  4. Create BareMetalHosts to represent the servers
  5. (Scenario 1) Provision the BareMetalHosts
  6. (Scenario 2) Deploy Cluster API and turn the BareMetalHosts into a Kubernetes cluster

Prerequisites

You will need the following tools installed.

  • docker (or podman)
  • kind or minikube (management cluster, not needed if you already have a “real” cluster that you want to use)
  • clusterctl
  • kubectl
  • htpasswd
  • virsh and virt-install for the virtualized setup

Baremetal lab configuration

The baremetal lab has two servers that we will call bml-01 and bml-02, as well as a management computer where we will set up Metal3. The servers are equipped with iLO 4 BMCs. These BMCs are connected to an “out of band” network (192.168.1.0/24) and they have the following IP addresses.

  • bml-01: 192.168.1.28
  • bml-02: 192.168.1.14

There is a separate network for the servers (192.168.0.0/24). The management computer is connected to both of these networks with IP addresses 192.168.1.7 and 192.168.0.150 respectively.

Finally, we will need the MAC addresses of the servers to keep track of which is which.

  • bml-01: 9C:63:C0:AC:10:42
  • bml-02: 80:c1:6e:7a:5a:a8

Virtualized configuration

If you do not have the hardware or perhaps just want to test things out without committing to a full baremetal lab, you may simulate it with virtual machines. In this section we will show how to create a virtual machine and use sushy-tools as a baseboard management controller for it.

The configuration is a bit simpler than in the baremetal lab because we don’t have a separate out of band network here. In the end we will have the BMC available as

  • bml-vm-01: 192.168.222.1:8000/redfish/v1/Systems/bmh-vm-01

and the MAC address:

  • bml-vm-01: 00:60:2f:31:81:01

Start by defining a libvirt network:

<network>
  <name>baremetal-e2e</name>
  <forward mode='nat'>
    <nat>
      <port start='1024' end='65535'/>
    </nat>
  </forward>
  <bridge name='metal3'/>
  <ip address='192.168.222.1' netmask='255.255.255.0'>
    <dhcp>
      <range start='192.168.222.3' end='192.168.222.99'/>
      <!-- Reserve IP for convenience -->
      <host mac='00:60:2f:31:81:01' name='bmh-vm-01' ip='192.168.222.101'/>
      <bootp file='http://192.168.222.2:6180/boot.ipxe'/>
    </dhcp>
  </ip>
</network>

Save this as net.xml.

Metal3 relies on baseboard management controllers to manage the baremetal servers, so we need something similar for our virtual machines. This comes in the form of sushy-tools.

We need to create a configuration file for sushy-tools:

# Listen on the local IP address 192.168.222.1
SUSHY_EMULATOR_LISTEN_IP = u'192.168.222.1'

# Bind to TCP port 8000
SUSHY_EMULATOR_LISTEN_PORT = 8000

# Serve this SSL certificate to the clients
SUSHY_EMULATOR_SSL_CERT = None

# If SSL certificate is being served, this is its RSA private key
SUSHY_EMULATOR_SSL_KEY = None

# The OpenStack cloud ID to use. This option enables OpenStack driver.
SUSHY_EMULATOR_OS_CLOUD = None
# The libvirt URI to use. This option enables libvirt driver.
SUSHY_EMULATOR_LIBVIRT_URI = u'qemu:///system'

# Instruct the libvirt driver to ignore any instructions to
# set the boot device. Allowing the UEFI firmware to instead
# rely on the EFI Boot Manager
# Note: This sets the legacy boot element to dev="fd"
# and relies on the floppy not existing, it likely won't work
# if your VM has a floppy drive.
SUSHY_EMULATOR_IGNORE_BOOT_DEVICE = False

# The map of firmware loaders dependent on the boot mode and
# system architecture. Ideally the x86_64 loader will be capable
# of secure boot or not based on the chosen nvram.
SUSHY_EMULATOR_BOOT_LOADER_MAP = {
    u'UEFI': {
        u'x86_64': u'/usr/share/OVMF/OVMF_CODE.secboot.fd'
    },
    u'Legacy': {
        u'x86_64': None
    }
}

Finally, we start up the virtual baremetal lab and create VMs to simulate the servers. Feel free to adjust things as you see fit, but make sure to note the MAC address. That will be needed later. You can choose how many VMs to create. At least one is needed, although more could be nice for scenario 2, to have more than one node in the cluster.

#!/usr/bin/env bash

# Define and start the baremetal-e2e network
virsh -c qemu:///system net-define net.xml
virsh -c qemu:///system net-start baremetal-e2e

# We need to create veth pair to connect the baremetal-e2e net (defined above)
# and the docker network used by kind. This is to allow controllers in
# the kind cluster to communicate with the VMs and vice versa.
# For example, Ironic needs to communicate with IPA.
# These options are the same as what kind creates by default,
# except that we hard code the IPv6 subnet and specify a bridge name.
#
# NOTE! If you used kind before, you already have this network but
# without the fixed bridge name. Please remove it first in that case!
# docker network rm kind
docker network create -d=bridge \
    -o com.docker.network.bridge.enable_ip_masquerade=true \
    -o com.docker.network.driver.mtu=1500 \
    -o com.docker.network.bridge.name="kind" \
    --ipv6 --subnet "fc00:f853:ccd:e793::/64" \
    kind

# Next create the veth pair
sudo ip link add metalend type veth peer name kindend
sudo ip link set metalend master metal3
sudo ip link set kindend master kind
sudo ip link set metalend up
sudo ip link set kindend up

# Then we need to set routing rules as well
sudo iptables -I FORWARD -i kind -o metal3 -j ACCEPT
sudo iptables -I FORWARD -i metal3 -o kind -j ACCEPT

# Start the sushy-emulator container that acts as BMC
docker run --name sushy-tools --rm --network host -d \
  -v /var/run/libvirt:/var/run/libvirt \
  -v "$(pwd)/sushy-emulator.conf:/etc/sushy/sushy-emulator.conf" \
  -e SUSHY_EMULATOR_CONFIG=/etc/sushy/sushy-emulator.conf \
  quay.io/metal3-io/sushy-tools:latest sushy-emulator

# Generate a VM definition xml file and then define the VM
# use --ram=8192 for Scenario 2
virt-install \
  --connect qemu:///system \
  --name bmh-vm-01 \
  --description "Virtualized BareMetalHost" \
  --osinfo=ubuntu-lts-latest \
  --ram=4096 \
  --vcpus=2 \
  --disk size=25 \
  --boot uefi,hd,network \
  --import \
  --network network=baremetal-e2e,mac="00:60:2f:31:81:01" \
  --noautoconsole \
  --print-xml > bmh-vm-01.xml
virsh define bmh-vm-01.xml
rm bmh-vm-01.xml

Common setup

This section is common for both the baremetal configuration and the virtualized environment. Specific configuration will always differ between environments though. We will go through how to configure and deploy Ironic and Baremetal Operator.

Image server

In order to do anything useful, we will need a server for hosting disk images that can be used to provision the servers. In this guide, we will use an nginx container for this. We also download some images that will be used later.

#!/usr/bin/env bash

mkdir disk-images

pushd disk-images || exit
wget https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img
wget https://cloud-images.ubuntu.com/jammy/current/SHA256SUMS
sha256sum --ignore-missing -c SHA256SUMS
wget https://artifactory.nordix.org/artifactory/metal3/images/k8s_v1.34.1/CENTOS_10_NODE_IMAGE_K8S_v1.34.1.qcow2
sha256sum CENTOS_10_NODE_IMAGE_K8S_v1.34.1.qcow2
# Convert to raw.
# This helps lower memory requirements, since the raw image can be streamed to disk
# instead of first loaded to memory by IPA for conversion.
qemu-img convert -f qcow2 -O raw CENTOS_10_NODE_IMAGE_K8S_v1.34.1.qcow2 CENTOS_10_NODE_IMAGE_K8S_v1.34.1.raw
# Local cache of IPA
wget https://tarballs.opendev.org/openstack/ironic-python-agent/dib/ipa-centos9-master.tar.gz
popd || exit

docker run --name image-server --rm -d -p 80:8080 \
  -v "$(pwd)/disk-images:/usr/share/nginx/html" nginxinc/nginx-unprivileged

DHCP server

The BareMetalHosts must be able to call back to Ironic when going through the inspection phase. This means that they must have IP addresses in a network where they can reach Ironic. Any DHCP server can be used for this. We use the Ironic container image that includes dnsmasq. It is deployed automatically together with Ironic.

Management cluster

If you already have a Kubernetes cluster that you want to use, go ahead and use that. Please ensure that it is connected to the relevant networks so that Ironic can reach the BMCs and so that the BareMetalHosts can reach Ironic.

If you do not have a cluster already, you can create one using kind. Please note that this is absolutely not intended for production environments.

We will use the following configuration file for kind, save it as kind.yaml:

apiVersion: kind.x-k8s.io/v1alpha4
kind: Cluster
nodes:
- role: control-plane
  # Open ports for Ironic
  extraPortMappings:
  # Ironic httpd
  - containerPort: 6180
    hostPort: 6180
    listenAddress: "0.0.0.0"
    protocol: TCP
  # Ironic API
  - containerPort: 6385
    hostPort: 6385
    listenAddress: "0.0.0.0"
    protocol: TCP

As you can see, it has a few ports forwarded from the host. This is to make Ironic reachable when it is running inside the kind cluster.

We will also need to install cert-manager and Ironic Standalone Operator. Finally, we deploy Ironic and Bare Metal Operator.

#!/usr/bin/env bash

kind create cluster --config kind.yaml

# (Optional) Initialize CAPM3. This is only needed for scenario 2, but it also installs
# cert-manager, which is needed for pretty much everything else.
# If you skip this, make sure you install cert-manager separately!
clusterctl init --infrastructure=metal3 --ipam=metal3

kubectl apply -f https://github.com/metal3-io/ironic-standalone-operator/releases/latest/download/install.yaml
kubectl -n ironic-standalone-operator-system wait --for=condition=Available deploy/ironic-standalone-operator-controller-manager

# Now we can deploy Ironic and BMO
kubectl create ns baremetal-operator-system
# Apply Ironic with retry logic (up to 5 attempts with 10 second delays).
# The IrSO webhook is not guaranteed to be ready when the IrSO deployment is,
# so some retries may be needed.
MAX_RETRIES=5
RETRY_DELAY=10
RETRY_COUNT=0
echo "Applying Ironic configuration..."
while [[ "${RETRY_COUNT}" -lt "${MAX_RETRIES}" ]]; do
  if kubectl apply -k ironic; then
    echo "Successfully applied Ironic configuration"
    break
  else
    RETRY_COUNT=$((RETRY_COUNT + 1))
    echo "Failed to apply Ironic configuration. Retrying in ${RETRY_DELAY} seconds... (Attempt ${RETRY_COUNT}/${MAX_RETRIES})"
    sleep ${RETRY_DELAY}
  fi
done
if [[ "${RETRY_COUNT}" -eq "${MAX_RETRIES}" ]]; then
  echo "ERROR: Failed to apply Ironic configuration after ${MAX_RETRIES} attempts. Exiting."
  exit 1
fi
kubectl apply -k bmo

We use the following manifest to deploy Ironic. Feel free to adjust as needed for your environment.

# kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: baremetal-operator-system
resources:
- ironic.yaml
- certificate.yaml

# ironic.yaml
apiVersion: ironic.metal3.io/v1alpha1
kind: Ironic
metadata:
  name: ironic
  namespace: baremetal-operator-system
spec:
  networking:
    dhcp:
      rangeBegin: "192.168.222.100"
      rangeEnd: "192.168.222.200"
      networkCIDR: "192.168.222.0/24"
    interface: "eth0"
    ipAddress: "192.168.222.2"
    ipAddressManager: "keepalived"
  tls:
    certificateName: ironic-cert

# certificate.yaml
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: selfsigned-issuer
  namespace: baremetal-operator-system
spec:
  selfSigned: {}
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: ironic-cacert
  namespace: baremetal-operator-system
spec:
  commonName: ironic-ca
  isCA: true
  issuerRef:
    kind: Issuer
    name: selfsigned-issuer
  secretName: ironic-cacert
---
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: ca-issuer
  namespace: baremetal-operator-system
spec:
  ca:
    secretName: ironic-cacert
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: ironic-cert
  namespace: baremetal-operator-system
spec:
  ipAddresses:
  - 192.168.222.2
  dnsNames:
  - ironic.baremetal-operator-system.svc
  issuerRef:
    kind: Issuer
    name: ca-issuer
  secretName: ironic-cert

For the Bare Metal Operator, we use a kustomization that looks like this:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: baremetal-operator-system
# This is the kustomization that we build on. You can download it and change
# the URL to a relative path if you do not want to access it over the network.
# Note that the ref=main specifies the version to use.
# We use main here simply because the integration with IrSO is not included in a release yet.
resources:
- https://github.com/metal3-io/baremetal-operator/config/use-irso?ref=main

Create BareMetalHosts

Now that we have Bare Metal Operator deployed, let’s put it to use by creating BareMetalHosts (BMHs) to represent our servers. You will need the protocol and IPs of the BMCs, as well as credentials for accessing them, and the servers MAC addresses.

Create one secret for each BareMetalHost, containing the credentials for accessing its BMC. No credentials are needed in the virtualized setup but you still need to create the secret with some values. Here is an example:

apiVersion: v1
kind: Secret
metadata:
  name: bml-01
type: Opaque
stringData:
  username: replaceme
  password: replaceme

Then continue by creating the BareMetalHost manifest. You can put it in the same file as the secret if you want. Just remember to separate the two resources with one line containing ---.

Here is an example of a BareMetalHost referencing the secret above with MAC address and BMC address matching our bml-01 server (see supported hardware for information on BMC addressing).

apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  name: bml-01
spec:
  online: true
  bootMACAddress: 9C:63:C0:AC:10:42
  bootMode: UEFI
  bmc:
    address: idrac-virtualmedia://192.168.1.28
    credentialsName: bml-01
    disableCertificateVerification: true

Here is the same for the virtualized BareMetalHost:

apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  name: bml-vm-01
spec:
  online: true
  bootMACAddress: 00:60:2f:31:81:01
  bootMode: UEFI
  hardwareProfile: libvirt
  bmc:
    address: redfish-virtualmedia+http://192.168.222.1:8000/redfish/v1/Systems/bmh-vm-01
    credentialsName: bml-01

Apply these in the cluster with kubectl apply -f path/to/file.

You should now be able to see the BareMetalHost go through registering and inspecting phases before it finally becomes available. Check with kubectl get bmh. The output should look similar to this:

NAME      STATE         CONSUMER   ONLINE   ERROR   AGE
bml-01    available                true             26m

(Scenario 1) Provision BareMetalHosts

If you want to manage the BareMetalHosts directly, keep reading. If you would rather use Cluster API to make Kubernetes clusters out of them, skip to the next section.

Edit the BareMetalHost to add details of what image you want to provision it with. For example:

apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  name: bml-vm-01
spec:
  online: true
  bootMACAddress: 00:60:2f:31:81:01
  bootMode: UEFI
  hardwareProfile: libvirt
  bmc:
    address: redfish-virtualmedia+http://192.168.222.1:8000/redfish/v1/Systems/bmh-vm-01
    credentialsName: bml-01
  image:
    checksumType: sha256
    checksum: http://192.168.222.1/SHA256SUMS
    format: qcow2
    url: http://192.168.222.1/jammy-server-cloudimg-amd64.img

Note that the URL for the disk image is not using the out of band network. Image provisioning works so that the Ironic Python Agent is first booted on the machine. From there (i.e. not in the out of band network) it downloads the disk image and writes it to disk. If the machine has several disks, and you want to specify which one to use, set rootDeviceHints (otherwise, /dev/sda is used by default).

The manifest above is enough to provision the BareMetalHost, but unless you have everything you need already baked in the disk image, you will most likely want to add some user-data and network-data. We will show here how to configure authorized ssh keys using user-data (see instance customization for more details).

First, we create a file (user-data.yaml) with the user-data:

#cloud-config
users:
- name: user
  ssh_authorized_keys:
  - ssh-ed25519 ABCD... user@example.com

Then create a secret from it.

kubectl create secret generic user-data --from-file=value=user-data.yaml --from-literal=format=cloud-config

Add the following to the BareMetalHost manifest to make it use the user-data:

spec:
  ...
  userData:
    name: user-data
    namespace: default

Apply the changes with kubectl apply -f path/to/file. You should now see the BareMetalHost go into provisioning and eventually become provisioned.

NAME      STATE         CONSUMER   ONLINE   ERROR   AGE
bml-01    provisioned              true             2h

You can now check the logs of the DHCP server to see what IP the BareMetalHost got (docker logs dnsmasq) and try to ssh to it.

(Scenario 2) Metal3 and Cluster API

If you want to turn the BareMetalHosts into Kubernetes clusters, you should consider using Cluster API and the infrastructure provider for Metal3. In this section we will show how to do it.

Initialize the Cluster API core components and the infrastructure provider for Metal3 (if you didn’t already do it):

clusterctl init --infrastructure metal3 --ipam=metal3

Now we need to set some environment variables that will be used to render the manifests from the cluster template. Most of them are related to the disk image that we downloaded above.

Note: There are many ways to configure and expose the API endpoint of the cluster. You need to decide how to do it. It will not “just work”. Here are some options:

  1. Configure a specific IP for the control-plane server through the DHCP server. This is doesn’t require anything extra but it is also very limited. You will not be able to upgrade the cluster for example.
  2. Set up a load balancer separately and use that as API endpoint.
  3. Use keepalived or kube-vip or similar to assign a VIP to one of the control-plane nodes.
# Baremetal lab image variables
# export IMAGE_URL="http://192.168.0.150/CENTOS_10_NODE_IMAGE_K8S_v1.34.1.qcow2"
# export IMAGE_CHECKSUM="afa7e95ee6fb92b952ab85bae4d01033651e690cf04a626c668041d7b94ddd4a"
# export IMAGE_FORMAT="qcow2"
# Virtualized setup variables
export IMAGE_URL="http://192.168.222.1/CENTOS_10_NODE_IMAGE_K8S_v1.34.1.raw"
export IMAGE_CHECKSUM="20537529c0588e1c3d1929981207ef6fac73df7b2500b84f462d09badcc670ea"
export IMAGE_FORMAT="raw"
# Common variables
export IMAGE_CHECKSUM_TYPE="sha256"
export KUBERNETES_VERSION="v1.34.1"
# Make sure this does not conflict with other networks
export POD_CIDR='["192.168.10.0/24"]'
# These can be used to add user-data
export CTLPLANE_KUBEADM_EXTRA_CONFIG="
    preKubeadmCommands:
    - systemctl enable --now crio
    users:
    - name: user
      sshAuthorizedKeys:
      - ssh-ed25519 ABCD... user@example.com"
export WORKERS_KUBEADM_EXTRA_CONFIG="
      preKubeadmCommands:
      - systemctl enable --now crio
      users:
      - name: user
        sshAuthorizedKeys:
        - ssh-ed25519 ABCD... user@example.com"
# NOTE! You must ensure that this is forwarded or assigned somehow to the
# server(s) that is selected for the control-plane.
# We reserved this address in the net.xml as a basic way to get a fixed IP.
export CLUSTER_APIENDPOINT_HOST="192.168.222.101"
export CLUSTER_APIENDPOINT_PORT="6443"

With the variables in place, we can render the manifests and apply:

clusterctl generate cluster my-cluster --control-plane-machine-count 1 --worker-machine-count 0 | kubectl apply -f -

You should see BareMetalHosts be provisioned as they are “consumed” by the Metal3Machines:

NAME      STATE         CONSUMER                        ONLINE   ERROR   AGE
bml-02    provisioned   my-cluster-controlplane-8z46n   true             68m

If all goes well and the API endpoint is correctly configured, you should eventually get a working cluster. Note that it will not become fully healthy until a CNI is deployed.

Deploy Calico as CNI:

clusterctl get kubeconfig my-cluster > kubeconfig.yaml
kubectl --kubeconfig=kubeconfig.yaml apply --server-side -f https://raw.githubusercontent.com/projectcalico/calico/v3.31.0/manifests/calico.yaml

Check cluster health with clusterctl describe cluster my-cluster:

NAME                                                REPLICAS  AVAILABLE  READY  UP TO DATE  STATUS           REASON            SINCE  MESSAGE
Cluster/my-cluster                                  1/1       1          1      1           Available: True  Available         48s
├─ClusterInfrastructure - Metal3Cluster/my-cluster                                          Ready: True      NoReasonReported  32m
└─ControlPlane - KubeadmControlPlane/my-cluster     1/1       1          1      1
  └─Machine/my-cluster-2zc9x                        1         1          1      1           Ready: True      Ready             48s

Cleanup

If you created a cluster using Cluster API, delete that first:

kubectl delete cluster my-cluster

Delete all BareMetalHosts with kubectl delete bmh <name>. This ensures that the servers are cleaned and powered off.

Delete the management cluster.

kind delete cluster

Stop image server. It is automatically removed when stopped.

docker stop image-server

You may also want to delete the disk images:

rm -r disk-images

If you did the virtualized setup you will also need to cleanup the sushy-tools container and the VM(s).

#!/usr/bin/env bash

docker rm -f sushy-tools

virsh -c qemu:///system destroy --domain "bmh-vm-01"
virsh -c qemu:///system undefine --domain "bmh-vm-01" --remove-all-storage --nvram

# Clear network
virsh -c qemu:///system net-destroy baremetal-e2e
virsh -c qemu:///system net-undefine baremetal-e2e

sudo iptables -D FORWARD -i kind -o metal3 -j ACCEPT
sudo iptables -D FORWARD -i metal3 -o kind -j ACCEPT

sudo ip link delete metalend type veth