Quick-start for Metal3
This guide has been tested on Ubuntu server 24.04. It should be seen as an example rather than the absolute truth about how to deploy and use Metal3. We will cover two environments and two scenarios. The environments are
- a baremetal lab with actual physical servers and baseboard management controllers (BMCs), and
- a virtualized baremetal lab with virtual machines and sushy-tools acting as BMC.
In both of these, we will show how to use Bare Metal Operator and Ironic to manage the servers through a Kubernetes API, as well as how to turn the servers into Kubernetes clusters managed through Cluster API. These are the two scenarios.
In a nut-shell, this is what we will do:
- (Optional) Setup virtualized lab environment
- Setup a disk image server
- Setup a management cluster
- Create BareMetalHosts to represent the servers
- (Scenario 1) Provision the BareMetalHosts
- (Scenario 2) Deploy Cluster API and turn the BareMetalHosts into a Kubernetes cluster
Prerequisites
You will need the following tools installed.
- docker (or podman)
- kind or minikube (management cluster, not needed if you already have a “real” cluster that you want to use)
- clusterctl
- kubectl
- htpasswd
- virsh and virt-install for the virtualized setup
Baremetal lab configuration
The baremetal lab has two servers that we will call bml-01 and bml-02, as well
as a management computer where we will set up Metal3. The servers are equipped
with iLO 4 BMCs. These BMCs are connected to an “out of band” network
(192.168.1.0/24) and they have the following IP addresses.
- bml-01: 192.168.1.28
- bml-02: 192.168.1.14
There is a separate network for the servers (192.168.0.0/24). The management
computer is connected to both of these networks with IP addresses 192.168.1.7
and 192.168.0.150 respectively.
Finally, we will need the MAC addresses of the servers to keep track of which is which.
- bml-01: 9C:63:C0:AC:10:42
- bml-02: 80:c1:6e:7a:5a:a8
Virtualized configuration
If you do not have the hardware or perhaps just want to test things out without committing to a full baremetal lab, you may simulate it with virtual machines. In this section we will show how to create a virtual machine and use sushy-tools as a baseboard management controller for it.
The configuration is a bit simpler than in the baremetal lab because we don’t have a separate out of band network here. In the end we will have the BMC available as
- bml-vm-01: 192.168.222.1:8000/redfish/v1/Systems/bmh-vm-01
and the MAC address:
- bml-vm-01: 00:60:2f:31:81:01
Start by defining a libvirt network:
<network>
<name>baremetal-e2e</name>
<forward mode='nat'>
<nat>
<port start='1024' end='65535'/>
</nat>
</forward>
<bridge name='metal3'/>
<ip address='192.168.222.1' netmask='255.255.255.0'>
<dhcp>
<range start='192.168.222.3' end='192.168.222.99'/>
<!-- Reserve IP for convenience -->
<host mac='00:60:2f:31:81:01' name='bmh-vm-01' ip='192.168.222.101'/>
<bootp file='http://192.168.222.2:6180/boot.ipxe'/>
</dhcp>
</ip>
</network>
Save this as net.xml.
Metal3 relies on baseboard management controllers to manage the baremetal servers, so we need something similar for our virtual machines. This comes in the form of sushy-tools.
We need to create a configuration file for sushy-tools:
# Listen on the local IP address 192.168.222.1
SUSHY_EMULATOR_LISTEN_IP = u'192.168.222.1'
# Bind to TCP port 8000
SUSHY_EMULATOR_LISTEN_PORT = 8000
# Serve this SSL certificate to the clients
SUSHY_EMULATOR_SSL_CERT = None
# If SSL certificate is being served, this is its RSA private key
SUSHY_EMULATOR_SSL_KEY = None
# The OpenStack cloud ID to use. This option enables OpenStack driver.
SUSHY_EMULATOR_OS_CLOUD = None
# The libvirt URI to use. This option enables libvirt driver.
SUSHY_EMULATOR_LIBVIRT_URI = u'qemu:///system'
# Instruct the libvirt driver to ignore any instructions to
# set the boot device. Allowing the UEFI firmware to instead
# rely on the EFI Boot Manager
# Note: This sets the legacy boot element to dev="fd"
# and relies on the floppy not existing, it likely won't work
# if your VM has a floppy drive.
SUSHY_EMULATOR_IGNORE_BOOT_DEVICE = False
# The map of firmware loaders dependent on the boot mode and
# system architecture. Ideally the x86_64 loader will be capable
# of secure boot or not based on the chosen nvram.
SUSHY_EMULATOR_BOOT_LOADER_MAP = {
u'UEFI': {
u'x86_64': u'/usr/share/OVMF/OVMF_CODE.secboot.fd'
},
u'Legacy': {
u'x86_64': None
}
}
Finally, we start up the virtual baremetal lab and create VMs to simulate the servers. Feel free to adjust things as you see fit, but make sure to note the MAC address. That will be needed later. You can choose how many VMs to create. At least one is needed, although more could be nice for scenario 2, to have more than one node in the cluster.
#!/usr/bin/env bash
# Define and start the baremetal-e2e network
virsh -c qemu:///system net-define net.xml
virsh -c qemu:///system net-start baremetal-e2e
# We need to create veth pair to connect the baremetal-e2e net (defined above)
# and the docker network used by kind. This is to allow controllers in
# the kind cluster to communicate with the VMs and vice versa.
# For example, Ironic needs to communicate with IPA.
# These options are the same as what kind creates by default,
# except that we hard code the IPv6 subnet and specify a bridge name.
#
# NOTE! If you used kind before, you already have this network but
# without the fixed bridge name. Please remove it first in that case!
# docker network rm kind
docker network create -d=bridge \
-o com.docker.network.bridge.enable_ip_masquerade=true \
-o com.docker.network.driver.mtu=1500 \
-o com.docker.network.bridge.name="kind" \
--ipv6 --subnet "fc00:f853:ccd:e793::/64" \
kind
# Next create the veth pair
sudo ip link add metalend type veth peer name kindend
sudo ip link set metalend master metal3
sudo ip link set kindend master kind
sudo ip link set metalend up
sudo ip link set kindend up
# Then we need to set routing rules as well
sudo iptables -I FORWARD -i kind -o metal3 -j ACCEPT
sudo iptables -I FORWARD -i metal3 -o kind -j ACCEPT
# Start the sushy-emulator container that acts as BMC
docker run --name sushy-tools --rm --network host -d \
-v /var/run/libvirt:/var/run/libvirt \
-v "$(pwd)/sushy-emulator.conf:/etc/sushy/sushy-emulator.conf" \
-e SUSHY_EMULATOR_CONFIG=/etc/sushy/sushy-emulator.conf \
quay.io/metal3-io/sushy-tools:latest sushy-emulator
# Generate a VM definition xml file and then define the VM
# use --ram=8192 for Scenario 2
virt-install \
--connect qemu:///system \
--name bmh-vm-01 \
--description "Virtualized BareMetalHost" \
--osinfo=ubuntu-lts-latest \
--ram=4096 \
--vcpus=2 \
--disk size=25 \
--boot uefi,hd,network \
--import \
--network network=baremetal-e2e,mac="00:60:2f:31:81:01" \
--noautoconsole \
--print-xml > bmh-vm-01.xml
virsh define bmh-vm-01.xml
rm bmh-vm-01.xml
Common setup
This section is common for both the baremetal configuration and the virtualized environment. Specific configuration will always differ between environments though. We will go through how to configure and deploy Ironic and Baremetal Operator.
Image server
In order to do anything useful, we will need a server for hosting disk images that can be used to provision the servers. In this guide, we will use an nginx container for this. We also download some images that will be used later.
#!/usr/bin/env bash
mkdir disk-images
pushd disk-images || exit
wget https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img
wget https://cloud-images.ubuntu.com/jammy/current/SHA256SUMS
sha256sum --ignore-missing -c SHA256SUMS
wget https://artifactory.nordix.org/artifactory/metal3/images/k8s_v1.34.1/CENTOS_10_NODE_IMAGE_K8S_v1.34.1.qcow2
sha256sum CENTOS_10_NODE_IMAGE_K8S_v1.34.1.qcow2
# Convert to raw.
# This helps lower memory requirements, since the raw image can be streamed to disk
# instead of first loaded to memory by IPA for conversion.
qemu-img convert -f qcow2 -O raw CENTOS_10_NODE_IMAGE_K8S_v1.34.1.qcow2 CENTOS_10_NODE_IMAGE_K8S_v1.34.1.raw
# Local cache of IPA
wget https://tarballs.opendev.org/openstack/ironic-python-agent/dib/ipa-centos9-master.tar.gz
popd || exit
docker run --name image-server --rm -d -p 80:8080 \
-v "$(pwd)/disk-images:/usr/share/nginx/html" nginxinc/nginx-unprivileged
DHCP server
The BareMetalHosts must be able to call back to Ironic when going through the inspection phase. This means that they must have IP addresses in a network where they can reach Ironic. Any DHCP server can be used for this. We use the Ironic container image that includes dnsmasq. It is deployed automatically together with Ironic.
Management cluster
If you already have a Kubernetes cluster that you want to use, go ahead and use that. Please ensure that it is connected to the relevant networks so that Ironic can reach the BMCs and so that the BareMetalHosts can reach Ironic.
If you do not have a cluster already, you can create one using kind. Please note that this is absolutely not intended for production environments.
We will use the following configuration file for kind, save it as kind.yaml:
apiVersion: kind.x-k8s.io/v1alpha4
kind: Cluster
nodes:
- role: control-plane
# Open ports for Ironic
extraPortMappings:
# Ironic httpd
- containerPort: 6180
hostPort: 6180
listenAddress: "0.0.0.0"
protocol: TCP
# Ironic API
- containerPort: 6385
hostPort: 6385
listenAddress: "0.0.0.0"
protocol: TCP
As you can see, it has a few ports forwarded from the host. This is to make Ironic reachable when it is running inside the kind cluster.
We will also need to install cert-manager and Ironic Standalone Operator. Finally, we deploy Ironic and Bare Metal Operator.
#!/usr/bin/env bash
kind create cluster --config kind.yaml
# (Optional) Initialize CAPM3. This is only needed for scenario 2, but it also installs
# cert-manager, which is needed for pretty much everything else.
# If you skip this, make sure you install cert-manager separately!
clusterctl init --infrastructure=metal3 --ipam=metal3
kubectl apply -f https://github.com/metal3-io/ironic-standalone-operator/releases/latest/download/install.yaml
kubectl -n ironic-standalone-operator-system wait --for=condition=Available deploy/ironic-standalone-operator-controller-manager
# Now we can deploy Ironic and BMO
kubectl create ns baremetal-operator-system
# Apply Ironic with retry logic (up to 5 attempts with 10 second delays).
# The IrSO webhook is not guaranteed to be ready when the IrSO deployment is,
# so some retries may be needed.
MAX_RETRIES=5
RETRY_DELAY=10
RETRY_COUNT=0
echo "Applying Ironic configuration..."
while [[ "${RETRY_COUNT}" -lt "${MAX_RETRIES}" ]]; do
if kubectl apply -k ironic; then
echo "Successfully applied Ironic configuration"
break
else
RETRY_COUNT=$((RETRY_COUNT + 1))
echo "Failed to apply Ironic configuration. Retrying in ${RETRY_DELAY} seconds... (Attempt ${RETRY_COUNT}/${MAX_RETRIES})"
sleep ${RETRY_DELAY}
fi
done
if [[ "${RETRY_COUNT}" -eq "${MAX_RETRIES}" ]]; then
echo "ERROR: Failed to apply Ironic configuration after ${MAX_RETRIES} attempts. Exiting."
exit 1
fi
kubectl apply -k bmo
We use the following manifest to deploy Ironic. Feel free to adjust as needed for your environment.
# kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: baremetal-operator-system
resources:
- ironic.yaml
- certificate.yaml
# ironic.yaml
apiVersion: ironic.metal3.io/v1alpha1
kind: Ironic
metadata:
name: ironic
namespace: baremetal-operator-system
spec:
networking:
dhcp:
rangeBegin: "192.168.222.100"
rangeEnd: "192.168.222.200"
networkCIDR: "192.168.222.0/24"
interface: "eth0"
ipAddress: "192.168.222.2"
ipAddressManager: "keepalived"
tls:
certificateName: ironic-cert
# certificate.yaml
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: selfsigned-issuer
namespace: baremetal-operator-system
spec:
selfSigned: {}
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: ironic-cacert
namespace: baremetal-operator-system
spec:
commonName: ironic-ca
isCA: true
issuerRef:
kind: Issuer
name: selfsigned-issuer
secretName: ironic-cacert
---
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: ca-issuer
namespace: baremetal-operator-system
spec:
ca:
secretName: ironic-cacert
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: ironic-cert
namespace: baremetal-operator-system
spec:
ipAddresses:
- 192.168.222.2
dnsNames:
- ironic.baremetal-operator-system.svc
issuerRef:
kind: Issuer
name: ca-issuer
secretName: ironic-cert
For the Bare Metal Operator, we use a kustomization that looks like this:
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: baremetal-operator-system
# This is the kustomization that we build on. You can download it and change
# the URL to a relative path if you do not want to access it over the network.
# Note that the ref=main specifies the version to use.
# We use main here simply because the integration with IrSO is not included in a release yet.
resources:
- https://github.com/metal3-io/baremetal-operator/config/use-irso?ref=main
Create BareMetalHosts
Now that we have Bare Metal Operator deployed, let’s put it to use by creating BareMetalHosts (BMHs) to represent our servers. You will need the protocol and IPs of the BMCs, as well as credentials for accessing them, and the servers MAC addresses.
Create one secret for each BareMetalHost, containing the credentials for accessing its BMC. No credentials are needed in the virtualized setup but you still need to create the secret with some values. Here is an example:
apiVersion: v1
kind: Secret
metadata:
name: bml-01
type: Opaque
stringData:
username: replaceme
password: replaceme
Then continue by creating the BareMetalHost manifest. You can put it in the same
file as the secret if you want. Just remember to separate the two resources with
one line containing ---.
Here is an example of a BareMetalHost referencing the secret above with MAC
address and BMC address matching our bml-01 server (see supported
hardware for information on BMC addressing).
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
name: bml-01
spec:
online: true
bootMACAddress: 9C:63:C0:AC:10:42
bootMode: UEFI
bmc:
address: idrac-virtualmedia://192.168.1.28
credentialsName: bml-01
disableCertificateVerification: true
Here is the same for the virtualized BareMetalHost:
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
name: bml-vm-01
spec:
online: true
bootMACAddress: 00:60:2f:31:81:01
bootMode: UEFI
hardwareProfile: libvirt
bmc:
address: redfish-virtualmedia+http://192.168.222.1:8000/redfish/v1/Systems/bmh-vm-01
credentialsName: bml-01
Apply these in the cluster with kubectl apply -f path/to/file.
You should now be able to see the BareMetalHost go through registering and
inspecting phases before it finally becomes available. Check with
kubectl get bmh. The output should look similar to this:
NAME STATE CONSUMER ONLINE ERROR AGE
bml-01 available true 26m
(Scenario 1) Provision BareMetalHosts
If you want to manage the BareMetalHosts directly, keep reading. If you would rather use Cluster API to make Kubernetes clusters out of them, skip to the next section.
Edit the BareMetalHost to add details of what image you want to provision it with. For example:
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
name: bml-vm-01
spec:
online: true
bootMACAddress: 00:60:2f:31:81:01
bootMode: UEFI
hardwareProfile: libvirt
bmc:
address: redfish-virtualmedia+http://192.168.222.1:8000/redfish/v1/Systems/bmh-vm-01
credentialsName: bml-01
image:
checksumType: sha256
checksum: http://192.168.222.1/SHA256SUMS
format: qcow2
url: http://192.168.222.1/jammy-server-cloudimg-amd64.img
Note that the URL for the disk image is not using the out of band network.
Image provisioning works so that the Ironic Python Agent is first booted on the
machine. From there (i.e. not in the out of band network) it downloads the disk
image and writes it to disk. If the machine has several disks, and you want to
specify which one to use, set rootDeviceHints
(otherwise, /dev/sda is used by default).
The manifest above is enough to provision the BareMetalHost, but unless you have everything you need already baked in the disk image, you will most likely want to add some user-data and network-data. We will show here how to configure authorized ssh keys using user-data (see instance customization for more details).
First, we create a file (user-data.yaml) with the user-data:
#cloud-config
users:
- name: user
ssh_authorized_keys:
- ssh-ed25519 ABCD... user@example.com
Then create a secret from it.
kubectl create secret generic user-data --from-file=value=user-data.yaml --from-literal=format=cloud-config
Add the following to the BareMetalHost manifest to make it use the user-data:
spec:
...
userData:
name: user-data
namespace: default
Apply the changes with kubectl apply -f path/to/file. You should now see the
BareMetalHost go into provisioning and eventually become provisioned.
NAME STATE CONSUMER ONLINE ERROR AGE
bml-01 provisioned true 2h
You can now check the logs of the DHCP server to see what IP the BareMetalHost
got (docker logs dnsmasq) and try to ssh to it.
(Scenario 2) Metal3 and Cluster API
If you want to turn the BareMetalHosts into Kubernetes clusters, you should consider using Cluster API and the infrastructure provider for Metal3. In this section we will show how to do it.
Initialize the Cluster API core components and the infrastructure provider for Metal3 (if you didn’t already do it):
clusterctl init --infrastructure metal3 --ipam=metal3
Now we need to set some environment variables that will be used to render the manifests from the cluster template. Most of them are related to the disk image that we downloaded above.
Note: There are many ways to configure and expose the API endpoint of the cluster. You need to decide how to do it. It will not “just work”. Here are some options:
- Configure a specific IP for the control-plane server through the DHCP server. This is doesn’t require anything extra but it is also very limited. You will not be able to upgrade the cluster for example.
- Set up a load balancer separately and use that as API endpoint.
- Use keepalived or kube-vip or similar to assign a VIP to one of the control-plane nodes.
# Baremetal lab image variables
# export IMAGE_URL="http://192.168.0.150/CENTOS_10_NODE_IMAGE_K8S_v1.34.1.qcow2"
# export IMAGE_CHECKSUM="afa7e95ee6fb92b952ab85bae4d01033651e690cf04a626c668041d7b94ddd4a"
# export IMAGE_FORMAT="qcow2"
# Virtualized setup variables
export IMAGE_URL="http://192.168.222.1/CENTOS_10_NODE_IMAGE_K8S_v1.34.1.raw"
export IMAGE_CHECKSUM="20537529c0588e1c3d1929981207ef6fac73df7b2500b84f462d09badcc670ea"
export IMAGE_FORMAT="raw"
# Common variables
export IMAGE_CHECKSUM_TYPE="sha256"
export KUBERNETES_VERSION="v1.34.1"
# Make sure this does not conflict with other networks
export POD_CIDR='["192.168.10.0/24"]'
# These can be used to add user-data
export CTLPLANE_KUBEADM_EXTRA_CONFIG="
preKubeadmCommands:
- systemctl enable --now crio
users:
- name: user
sshAuthorizedKeys:
- ssh-ed25519 ABCD... user@example.com"
export WORKERS_KUBEADM_EXTRA_CONFIG="
preKubeadmCommands:
- systemctl enable --now crio
users:
- name: user
sshAuthorizedKeys:
- ssh-ed25519 ABCD... user@example.com"
# NOTE! You must ensure that this is forwarded or assigned somehow to the
# server(s) that is selected for the control-plane.
# We reserved this address in the net.xml as a basic way to get a fixed IP.
export CLUSTER_APIENDPOINT_HOST="192.168.222.101"
export CLUSTER_APIENDPOINT_PORT="6443"
With the variables in place, we can render the manifests and apply:
clusterctl generate cluster my-cluster --control-plane-machine-count 1 --worker-machine-count 0 | kubectl apply -f -
You should see BareMetalHosts be provisioned as they are “consumed” by the Metal3Machines:
NAME STATE CONSUMER ONLINE ERROR AGE
bml-02 provisioned my-cluster-controlplane-8z46n true 68m
If all goes well and the API endpoint is correctly configured, you should eventually get a working cluster. Note that it will not become fully healthy until a CNI is deployed.
Deploy Calico as CNI:
clusterctl get kubeconfig my-cluster > kubeconfig.yaml
kubectl --kubeconfig=kubeconfig.yaml apply --server-side -f https://raw.githubusercontent.com/projectcalico/calico/v3.31.0/manifests/calico.yaml
Check cluster health with clusterctl describe cluster my-cluster:
NAME REPLICAS AVAILABLE READY UP TO DATE STATUS REASON SINCE MESSAGE
Cluster/my-cluster 1/1 1 1 1 Available: True Available 48s
├─ClusterInfrastructure - Metal3Cluster/my-cluster Ready: True NoReasonReported 32m
└─ControlPlane - KubeadmControlPlane/my-cluster 1/1 1 1 1
└─Machine/my-cluster-2zc9x 1 1 1 1 Ready: True Ready 48s
Cleanup
If you created a cluster using Cluster API, delete that first:
kubectl delete cluster my-cluster
Delete all BareMetalHosts with kubectl delete bmh <name>. This ensures that
the servers are cleaned and powered off.
Delete the management cluster.
kind delete cluster
Stop image server. It is automatically removed when stopped.
docker stop image-server
You may also want to delete the disk images:
rm -r disk-images
If you did the virtualized setup you will also need to cleanup the sushy-tools container and the VM(s).
#!/usr/bin/env bash
docker rm -f sushy-tools
virsh -c qemu:///system destroy --domain "bmh-vm-01"
virsh -c qemu:///system undefine --domain "bmh-vm-01" --remove-all-storage --nvram
# Clear network
virsh -c qemu:///system net-destroy baremetal-e2e
virsh -c qemu:///system net-undefine baremetal-e2e
sudo iptables -D FORWARD -i kind -o metal3 -j ACCEPT
sudo iptables -D FORWARD -i metal3 -o kind -j ACCEPT
sudo ip link delete metalend type veth