The Metal³ project (pronounced: “Metal Kubed”) provides components for bare metal host management with Kubernetes. You can enrol your bare metal machines, provision operating system images, and then, if you like, deploy Kubernetes clusters to them. From there, operating and upgrading your Kubernetes clusters can be handled by Metal³. Moreover, Metal³ is itself a Kubernetes application, so it runs on Kubernetes, and uses Kubernetes resources and APIs as its interface.

Metal³ is one of the providers for the Kubernetes sub-project Cluster API. Cluster API provides infrastructure agnostic Kubernetes lifecycle management, and Metal³ brings the bare metal implementation.

This is paired with one of the components from the OpenStack ecosystem, Ironic for booting and installing machines. Metal³ handles the installation of Ironic as a standalone component (there’s no need to bring along the rest of OpenStack). Ironic is supported by a mature community of hardware vendors and supports a wide range of bare metal management protocols which are continuously tested on a variety of hardware. Backed by Ironic, Metal³ can provision machines, no matter the brand of hardware.

In summary, you can write Kubernetes manifests representing your hardware and your desired Kubernetes cluster layout. Then Metal³ can:

  • Discover your hardware inventory
  • Configure BIOS and RAID settings on your hosts
  • Optionally clean a host’s disks as part of provisioning
  • Install and boot an operating system image of your choice
  • Deploy Kubernetes
  • Upgrade Kubernetes or the operating system in your clusters with a non-disruptive rolling strategy
  • Automatically remediate failed nodes by rebooting them and removing them from the cluster if necessary

You can even deploy Metal³ to your clusters so that they can manage other clusters using Metal³…

Metal³ is open-source and welcomes community contributions. The community meets at the following venues:

  • #cluster-api-baremetal on Kubernetes Slack
  • Metal³ development mailing list
  • From the mailing list, you’ll also be able to find the details of a weekly Zoom community call on Wednesdays at 14:00 GMT

About this guide

This user guide aims to explain the Metal³ feature set, and provide how-tos for using Metal³. It’s not a tutorial (for that, see the Getting Started Guide). Nor is it a reference (for that, see the API Reference Documentation, and of course, the code itself.)

Bare Metal Operator

The Bare Metal Operator (BMO) is a custom Kubernetes controller that deploys baremetal hosts, represented in Kubernetes by BareMetalHost (BMH), as Kubernetes nodes. To this end Ironic is used.

The BMO controller is responsible for the following:

  • Inspect the host’s hardware details and report them on the corresponding BareMetalHost. This includes information about CPUs, RAM, disks, NICs, and more.
  • Provision hosts with a desired image.
  • Clean a host’s disk contents before or after provisioning

The BareMetalHost represents a bare metal host (server). The BareMetalHost contains information about the server as shown below. For brevity, some part of the output are omitted, but we can classify the fields into the following broad categories.

  1. known server properties: Fields such as bootMACAddress properties of the server and are known in advance.
  2. unknown server properties: Fields such as CPU and disk are properties of the server and are discovered by Ironic.
  3. user supplied: Fields such as image are supplied by user to dictate boot image for the server.
  4. dynamic fields: Fields such as IP could be dynamically assigned to the server at run time by DHCP server.

During the life cycle of a bare metal host, upgrade is one example, some of these fields change with information coming from Ironic or other controllers while fields, such as MAC address, do not change (upgrade is one example).

BMO can also work with the Cluster API Provider Metal3 (CAPM3) controller. With the involvement of CAPM3 and Ironic, a simplified information flow path and an overview of the BareMetalHost resource is shown below:

kind: BareMetalHost
  name: node-0
  namespace: metal3
    address: ipmi://
    credentialsName: node-0-bmc-secret
  bootMACAddress: 00:5a:91:3f:9a:bd
    name: test1-workers-tbwnz-networkdata
    namespace: metal3
  online: true
    name: test1-workers-vd4gj
    namespace: metal3
      arch: x86_64
      count: 2
    hostname: node-0
    - ip:
      mac: 00:5a:91:3f:9a:bd
      name: enp1s0
    ramMebibytes: 4096
    - hctl: "0:0:0:0"
      name: /dev/sda
      serialNumber: drive-scsi0-0-0-0
      sizeBytes: 53687091200
      type: HDD

It would help to use an example to describe what BMO does. There are two operations of interest, getting hardware details of the server and booting the server with a given image, including user supplied cloud-init data. The BareMetalHost resource contains address and authentication information towards a server.

BMO communicates this information to Ironic and gets hardware details (a.k.a. inspection data), such as CPU and disk, of the server in return. This information is added to the BareMetalHost resource status. In order to get such server related information, the server is booted with service ramdisk. If there are hardware related changes, the BareMetalHost is updated accordingly.

The following diagrams ilustrates the information flow and components involved. From the left, the first two boxes represent Kubernetes custom controllers reconciling the custom resources shown inside. The comments, in yellow, show some relevant fields in these resources.

The right most box represents the bare metal server on which the inspection is done, Operating system is installed and bootstrap script is run. And, the third box shows Ironic which synchronizes the information about the Bare Metal server between the two sides.

Next, with the information coming from the CAPM3 side, the BareMetalHost is updated with image and cloud-init data. That information is also conveyed to Ironic and the server is booted accordingly.

This happens for example when the user scales a MachineDeployment so that the server should be added to the cluster, or during an upgrade when it must change the image it is booting from.

The information flow and operations described above are a bit simplified. CAPM3 provides more data and there are other operations, such as disk cleaning, on the Ironic side as well. However, the overall process remains the same. BMO keeps the server and BareMetalHost resource in sync.

To this end, it takes the server as a source of truth for some fields, such as Hardware details. For other fields, such as Boot image, it takes the information from CAPM3 as a source of the truth and does the sync accordingly.

Automated Cleaning

One of the Ironic’s feature exposed to Metal3 Baremetal Operator is node automated cleaning. When enabled, automated cleaning kicks off when a node is provisioned first time and on every time deprovisioned.

There are two automated cleaning modes available which can be set via automatedCleaningMode field of a BareMetalHost spec.

  • metadata to enable the disk cleaning
  • disabled to disable the disk cleaning

We named enabling mode metadata instead of simply enabled because we expect that in the future we will expand the feature to allow selecting certains disks (specified via metadata) of a node to be cleaned, which is currently out of scope.

kind: BareMetalHost
  name: example-node
  automatedCleaningMode: metadata
  online: true
  bootMACAddress: 00:8a:b6:8e:ac:b8
  bootMode: legacy
    address: ipmi://
    credentialsName: example-node-bmc-secret
  automatedCleaningMode: metadata

For a node with disabled value, no cleaning will be performed during deprovisioning. Note that this might introduce security vulnerabilities in case there is sensitive data which must be wiped out from the disk when the host is being recycled.

If automatedCleaningMode is not set by the user, it will be set to the default mode metadata. To know more about cleaning steps that Ironic performs on the node, see the cleaning steps.

If you are using Cluster-api-provider-metal3 on top of Baremetal Operator, then please see this.

Automatic secure boot

The automatic secure boot feature allows enabling and disabling UEFI (Unified Extensible Firmware Interface) secure boot when provisioning a host. This feature requires supported hardware and compatible OS image. The current hardwares that support enabling UEFI secure boot are iLO, iRMC and Redfish drivers.

Check also:

Why do we need it

We need the Automatic secure boot when provisioning a host with high security requirements. Based on checksum and signature, the secure boot protects the host from loading malicious code in the boot process before loading the provisioned operating system.

How to use it

To enable Automatic secure boot, first check if hardware is supported and then specify the value UEFISecureBoot for bootMode in the BareMetalHost custom resource. Please note, it is enabled before booting into the deployed instance and disabled when the ramdisk is running and on tear down. Below you can check the example:

kind: BareMetalHost
  name: node-1
  online: true
  bootMACAddress: 00:5c:52:31:3a:9c
  bootMode: UEFISecureBoot

This will enable UEFI before booting the instance and disable it when deprovisioned. Note that the default value for bootMode is UEFI.

Live ISO

The live-iso API in Metal3 allows booting a BareMetalHost with a live ISO image instead of writing an image to the local disk using the IPA deploy ramdisk.

Why we need it?

In some circumstances, i.e to reduce boot time for ephemeral workloads, it may be possible to boot an iso and not deploy any image to disk (saving the time to write the image and reboot). This API is also useful for integration with 3rd party installers distributed as a CD image, for example leveraging the existing toolchains like fedora-coreos installer might be desirable.

How to use it?

Here is an example with a BareMetalHost CRD, where iso referenced by the url and live-iso set in DiskFormat will be live-booted without deploying an image to disk. Additionally, live ISO mode is supported with any virtualmedia driver when used as a BMC driver. Also, checksum options are not required in this case, and will be ignored if specified:

kind: BareMetalHost
  name: live-iso-booted-node
    format: live-iso
  online: true

Note: rootDeviceHints, networkData and userData will not be used since the image is not written to disk.

For more details, please see the design proposal.

Detached annotation

The detached annotation provides a way to prevent management of a BareMetalHost. It works by deleting the host information from Ironic without triggering deprovisioning. The BareMetal Operator will recreate the host in Ironic again once the annotation is removed. This annotation can be used with BareMetalHosts in Provisioned, ExternallyProvisioned, Ready or Available states.

Normally, deleting a BareMetalHost will always trigger deprovisioning. This can be problematic and unnecessary if we just want to, for example, move the BareMetalHost from one cluster to another. By applying the annotation before removing the BareMetalHost from the old cluster, we can ensure that the host is not disrupted by this (normally it would be deprovisioned). The next step is then to recreate it in the new cluster without triggering a new inspection. See the status annotation page for how to do this.

The annotation key is and the value can be anything (it is ignored). Here is an example:

kind: BareMetalHost
  name: example
  annotations: ""
  online: true
  bootMACAddress: 00:8a:b6:8e:ac:b8
  bootMode: legacy
    address: ipmi://
    credentialsName: example-bmc-secret

Why is this annotation needed?

  • It provides a way to move BareMetalHosts between clusters (essentially deleting them in the old cluster and recreating them in the new) without going through deprovisioning, inspection and provisioning.
  • It allows deleting the BareMetalHost object without triggering deprovisioning. This can be used to hand over management of the host to a different system without disruption.

For more details, please see the design proposal.

Status annotation

The status annotation is useful when you need to avoid inspection of a BareMetalHost. This can happen if the status is already known, for example, when moving the BareMetalHost from one cluster to another. By setting this annotation, the BareMetal Operator will take the status of the BareMetalHost directly from the annotation.

The annotation key is and the value is a JSON representation of the BareMetalHosts status field. One simple way of extracting the status and turning it into an annotation is using kubectl like this:

# Save the status in json format to a file
kubectl get bmh <name-of-bmh> -o jsonpath="{.status}" > status.json
# Save the BMH and apply the status annotation to the saved BMH.
kubectl -n metal3 annotate bmh <name-of-bmh> \"$(cat status.json)" \
  --dry-run=client -o yaml > bmh.yaml

Note that the above example does not apply the annotation to the BareMetalHost directly since this is most likely not useful to apply it on one that already has a status. Instead it saves the BareMetalHost with the annotation applied to a file bmh.yaml. This file can then be applied in another cluster. The status would be discarded at this point since the user is usually not allowed to set it, but the annotation is still there and would be used by the BareMetal Operator to set status again. Once this is done, the operator will remove the status annotation. In this situation you may also want to check the detached annotation for how to remove the BareMetalHost from the old cluster without going through deprovisioning.

Here is an example of a BareMetalHost, first without the annotation, but with status and spec, and then the other way around. This shows how the status field is turned into the annotation value.

kind: BareMetalHost
  name: node-0
  namespace: metal3
  automatedCleaningMode: metadata
    address: redfish+
    credentialsName: node-0-bmc-secret
  bootMACAddress: 00:80:1f:e6:f1:8f
  bootMode: legacy
  online: true
  errorCount: 0
  errorMessage: ""
      name: node-0-bmc-secret
      namespace: metal3
    credentialsVersion: "1775"
  hardwareProfile: ""
  lastUpdated: "2022-05-31T06:33:05Z"
      end: null
      start: null
      end: null
      start: "2022-05-31T06:33:05Z"
      end: null
      start: null
      end: "2022-05-31T06:33:05Z"
      start: "2022-05-31T06:32:54Z"
  operationalStatus: OK
  poweredOn: false
    ID: 8d566f5b-a28f-451b-a70f-419507c480cd
    bootMode: legacy
      url: ""
    state: inspecting
      name: node-0-bmc-secret
      namespace: metal3
    credentialsVersion: "1775"
kind: BareMetalHost
  name: node-0
  namespace: metal3
  annotations: |

External inspection

Similar to the status annotation, external inspection makes it possible to skip the inspection step. The difference is that the status annotation can only be used on the very first reconcile and allows setting all the fields under status. In contrast, external inspection limits the changes so that only HardwareDetails can be modified, and it can be used at any time when inspection is disabled (with the disabled annotation) or when there is no existing HardwareDetails data.

External inspection is controlled through an annotation on the BareMetalHost. The annotation key is and the value is a JSON representation of the BareMetalHosts status.hardware field.

Here is an example with a BMH that has inspection disabled and is using the external inspection feature to add the HardwareDetails.

kind: BareMetalHost
  name: node-0
  namespace: metal3
  annotations: disabled |
      {"systemVendor":{"manufacturer":"QEMU", "productName":"Standard PC (Q35 + ICH9, 2009)","serialNumber":""}, "firmware":{"bios":{"date":"","vendor":"","version":""}},"ramMebibytes":4096, "nics":[{"name":"eth0","model":"0x1af4 0x0001","mac":"00:b7:8b:bb:3d:f6", "ip":"","speedGbps":0,"vlanId":0,"pxe":true}], "storage":[{"name":"/dev/sda","rotational":true,"sizeBytes":53687091200, "vendor":"QEMU", "model":"QEMU HARDDISK","serialNumber":"drive-scsi0-0-0-0", "hctl":"6:0:0:0"}],"cpu":{"arch":"x86_64", "model":"Intel Xeon E3-12xx v2 (IvyBridge)","clockMegahertz":2494.224, "flags":["foo"],"count":4},"hostname":"hwdAnnotation-0"}

Why is this needed?

  • It allows avoiding an extra reboot for live-images that include their own inspection tooling.
  • It provides an arguably safer alternative to the status annotation in some cases.


  • If both and are specified on BareMetalHost creation, will take precedence and overwrite any hardware data specified via
  • If the BareMetalHost is in the Available state the controller will not attempt to match profiles based on the annotation.

Inspect annotation

The inspect annotation can be used to request the baremetal operator to (re-)inspect a Ready BareMetalHost. This is useful in case there were hardware changes for example. Note that it is only possible to do this when BareMetalHost is in Ready state. If an inspection request is made while BareMetalHost is any other state than Ready, the request will be ignored.

To request a new inspection, simply annotating the host with is enough. Once inspection is requested, you should see the BMH in inspecting state until inspection is completed, and by the end of inspection the annotation will be removed automatically.

Here is an example:

kind: BareMetalHost
  name: example
    # The inspect annotation with no value ""
  online: true
  bootMACAddress: 00:8a:b6:8e:ac:b8
  bootMode: legacy
    address: ipmi://
    credentialsName: example-bmc-secret

Why is this needed?

  • For re-inspecting BareMetalHosts after hardware changes.


  • It is only possible to inspect a BareMetalHost when it is in Ready state.

Note: For other use cases, like disabling inspection or providing externally gathered inspection data, see external inspection.

Reboot annotation

The reboot annotation can be used for rebooting BareMetalHosts in the provisioned state. The annotation key takes either of the following forms:


In its basic form (, the annotation will trigger a reboot of the BareMetalHost. The controller will remove the annotation as soon as it has restored power to the host.

The advanced form ({key}) includes a unique suffix (indicated with {key}). In this form the host will be kept in PoweredOff state until the annotation has been removed. This can be useful if some tasks needs to be performed while the host is in a known stable state. The purpose of the {key} is to allow multiple clients to use the API simultaneously in a safe way. Each client chooses a key and touches only the annotations that has this key to avoid interfering with other clients.

If there are multiple annotations, the controller will wait for all of them to be removed (by the clients) before powering on the host. Similarly, if both forms of annotations are used, the{key} form will take precedence. This ensures that the host stays powered off until all clients are ready (i.e. all annotations are removed).

The annotation value should be a JSON map containing the key mode and a value hard or soft to indicate if a hard or soft reboot should be performed. It is not necessary to specify the annotation value. In case it is omitted, the default is to first try a soft reboot, and if that fails, do a hard reboot.

The exact behavior of hard and soft reboot depends on the Ironic configuration. Please see the Ironic configuration reference for more details on this, e.g. the soft_power_off_timeout variable is relevant.

Here are a few examples of the reboot annotation:

  • - immediate reboot via soft shutdown first, followed by a hard shutdown if the soft shutdown fails.
  • {'mode':'hard'} - immediate reboot via hard shutdown, potentially allowing for high-availability use-cases.
  •{key} - phased reboot, issued and managed by the client registered with the key, via soft shutdown first, followed by a hard reboot if the soft reboot fails.
  •{key}: {'mode':'hard'} - phased reboot, issued and managed by the client registered with the key, via hard shutdown.

And here is a “full” example showing a BareMetalHost with the annotation applied:

kind: BareMetalHost
  name: example
    # The basic form with no value ""
    # Advanced form with value "{'mode':'soft'}"
  online: true
  bootMACAddress: 00:8a:b6:8e:ac:b8
  bootMode: legacy
    address: ipmi://
    credentialsName: example-bmc-secret

Why is this needed?

  • It enables controllers and users to perform reboots.
  • It provides a way to remediate failed hosts. (“Have you tried turning it off and on again?”)
  • It provides a stable state (powered off) where certain tasks can be performed without risk of interference from the powered off machine.


  • Clients using this API must respect each other and clean up after themselves. Otherwise they will step on each others toes by for example, leaving an annotation indefinitely or removing someone else’s annotation before they were ready.

For more details please check the reboot interface proposal.


Ironic is an open-source service for automating provisioning and lifecycle management of bare metal machines. Born as the Bare Metal service of the OpenStack cloud software suite, it has evolved to become a semi-autonomous project, adding ways to be deployed independently as a standalone service, for example using Bifrost, and integrates in other tools and projects, as in the case of Metal3.

Ironic nowadays supports the two main standard hardware management interfaces, Redfish and IPMI, and thanks to its large community of contributors, it can provide native support for many different bare-metal hardware vendors, such as Dell, Fujitsu, HPE, and Supermicro.

Why Ironic in Metal3

  • Ironic is open source! This aligns perfectly with the philosophy behind Metal3.
  • Ironic has a vendor agnostic interface provided by a robust set of RESTful APIs.
  • Ironic has a vibrant and diverse community, including small and large operators, hardware and software vendors.
  • Ironic provides features covering the whole hardware life-cycle: from bare metal machine registration and hardware specifications retrieval of newly discovered bare metal machines, configuration and provisioning with custom operating system images, up to machines reset, cleaning for re-provisionionig or end-of-life retirement.

How Metal3 uses Ironic

The Metal3 project adopted Ironic as the back-end that manages bare-metal hosts behind native Kubernetes API.

Bare Metal Operator is the main component that interfaces with the Ironic API for all operations needed to provision bare-metal hosts, such as hardware capabilites inspection, operating system installation, and re-initialization when restoring a bare-metal machine to its original status.


Install Ironic

Metal3 runs Ironic as a set of containers. Those containers can be deployed either in-cluster and out-of-cluster. In both scenarios, there are a couple of containers that must run in order to provision baremetal nodes.

  • ironic
  • ironic-inspector
  • ironic-endpoint-keepalived
  • ironic-log-watch
  • ipa-downloader
  • dnsmasq
  • httpd

To know more about each container’s functionality check the documentation here.


Container runtime (e.g., docker, podman). Here we use docker.

Environmental variables

The following environmental variables can be passed to configure the Ironic services:

  • HTTP_PORT - port used by httpd server (default 6180)
  • PROVISIONING_IP - provisioning interface IP address to use for ironic, dnsmasq(dhcpd) and httpd (default
  • CLUSTER_PROVISIONING_IP - cluster provisioning interface IP address (default
  • PROVISIONING_INTERFACE - interface to use for ironic, dnsmasq(dhcpd) and httpd (default ironicendpoint)
  • CLUSTER_DHCP_RANGE - dhcp range to use for provisioning (default
  • DEPLOY_KERNEL_URL - the URL of the kernel to deploy ironic-python-agent
  • DEPLOY_RAMDISK_URL - the URL of the ramdisk to deploy ironic-python-agent
  • IRONIC_ENDPOINT - the endpoint of the ironic
  • IRONIC_INSPECTOR_ENDPOINT - the endpoint of the ironic inspector
  • CACHEURL - the URL of the cached images
  • IRONIC_FAST_TRACK - whether to enable fast_track provisioning or not (default true)
  • IRONIC_KERNEL_PARAMS - kernel parameters to pass to IPA (default console=ttyS0)
  • IRONIC_INSPECTOR_VLAN_INTERFACES - VLAN interfaces included in introspection, all - all VLANs on all interfaces, using LLDP information (default), interface all VLANs on an interface, using LLDP information, interface.vlan - a particular VLAN interface, not using LLDP
  • IRONIC_BOOT_ISO_SOURCE - where the boot iso image will be served from, possible values are: local (default), to download the image, prepare it and serve it from the conductor; http, to serve it directly from its HTTP URL
  • IPA_DOWNLOAD_ENABLED - enables the use of the Ironic Python Agent Downloader container to download IPA archive (default true)
  • USE_LOCAL_IPA - enables the use of locally supplied IPA archive. This condition is handled by BMO and this has effect only when IPA_DOWNLOAD_ENABLED is “false”, otherwise IPA_DOWNLOAD_ENABLED takes precedence. (default false)
  • LOCAL_IPA_PATH - this has effect only when USE_LOCAL_IPA is set to “true”, points to the directory where the IPA archive is located. This variable is handled by BMO. The variable should contain an arbitrary path pointing to the directory that contains the ironic-python-agent.tar
  • GATEWAY_IP - gateway IP address to use for ironic dnsmasq (dhcpd)
  • DNS_IP - DNS IP address to use for ironic dnsmasq (dhcpd)

To know how to pass these variables, please see the sections below.

Ironic in-cluster installation

For in-cluster Ironic installation, we will run a set of containers within a single pod in a Kubernetes cluster. You can enable TLS or basic auth or even disable both for Ironic and inspector communication. Below we will see kustomize folders that will help us to install Ironic for each mentioned case. In each of these deployments, a ConfigMap will be created and mounted to the Ironic pod. The ConfigMap will be populated based on environment variables from ironic-deployment/default/ironic_bmo_configmap.env. As such, update ironic_bmo_configmap.env with your custom values before deploying the Ironic.

We assume you are inside the local baremetal-operator path, if not you need to clone it first and cd to the root path.

 git clone
 cd baremetal-operator

Basic authentication enabled:

 kustomize build ironic-deployment/basic-auth | kubectl apply -f -

TLS enabled:

 kustomize build ironic-deployment/basic-auth/tls | kubectl apply -f -

Ironic out-of-cluster installation

For out-of-cluster Ironic installation, we will run a set of docker containers outside of a Kubernetes cluster. To pass Ironic settings, you can export corresponding environmental variables on the current shell before calling installation script. This will start below containers:

  • ironic
  • ironic-inspector
  • ironic-endpoint-keepalived
  • ironic-log-watch
  • ipa-downloader
  • dnsmasq
  • httpd
  • mariadb; if IRONIC_USE_MARIADB = “true”

If in-cluster ironic installation, we used different manifests for TLS and basic auth, here we are exporting environment variables for enabling/disabling TLS & basic auth but use the same script.

TLS and Basic authentication disabled

 export IRONIC_FAST_TRACK="false"  # Example of manipulating Ironic settings
 export IRONIC_TLS_SETUP="false"   # Disable TLS
 export IRONIC_BASIC_AUTH="false"  # Disable basic auth

Basic authentication enabled

 export IRONIC_TLS_SETUP="false"
 export IRONIC_BASIC_AUTH="true"

TLS enabled

 export IRONIC_TLS_SETUP="true"
 export IRONIC_BASIC_AUTH="false"

Ironic Python Agent (IPA)

IPA is a service written in python that runs within a ramdisk. It provides remote access to ironic and ironic-inspector services to perform various operations on the managed server. It also sends information about the server to Ironic.

By default, RDO ramdisks from registry is used. However, another remote registry or a local IPA archive can be specified. ipa-downloader is responsible for downloading the IPA ramdisk image to a shared volume from where the nodes are able to retrieve it.

Data flow

IPA interacts with other components. The information exchanged and the component to which it is sent to or received from are described below. The communication between IPA and these components can be encrypted in-transit with SSL/TLS.

  • Heartbeat: periodic message informing Ironic that the node is still running.
  • Lookup: data sent to Ironic that helps it determine Ironic’s node UUID for the node.
  • Introspection: data about hardware details, such as CPU, disk, RAM and network interfaces.

The above data is sent/received as follows.

  • Lookup/heartbeats data is sent to Ironic.
  • Introspection result is sent to ironic-inspector.
  • User supplied boot image that will be written to the node’s disk is retrieved from HTTPD server


Ironic Container Images

The currently available ironic container images are listed below.

Name and link to repositoryContent/Purpose
ironic-imageIronic api and conductor / Ironic Inspector / Sushy tools / virtualbmc
ironic-ipa-downloaderDistribute the ironic python agent ramdisk
ironic-hardware-inventory-recorder-imageIronic python agent hardware collector daemon
ironic-static-ip-managerSet and maintain IP for provisioning pod
ironic-clientIronic CLI utilities

How to build a container image

Each repository mentioned in the list contains a Dockerfile that can be used to build the relative container. The build process is as easy as using the docker or podman command and point to the Dockerfile, for example in case of the ironic-image:

git clone
cd ironic-image
docker build . -f Dockerfile

In some cases a make sub-command is provided to build the image using docker, usually make docker

Build ironic-image from source

The standard build command builds the container using RPMs taken from the RDO project, although an alternative build option has been provided for the ironic-image container to use source code instead.

Setting the argument INSTALL_TYPE to source in the build cli command triggers the build from source code:

docker build . -f Dockerfile --build-arg INSTALL_TYPE=source

When building the ironic image from source, it is also possible to specify a different source for ironic, ironic-inspector or the sushy library using the build arguments IRONIC_SOURCE, IRONIC_INSPECTOR_SOURCE, and SUSHY_SOURCE. The accepted formats are gerrit refs, like refs/changes/89/860689/2, commit hashes, like a1fe6cb41e6f0a1ed0a43ba5e17745714f206f1f, or a local directory that needs to be under the sources/ directory in the container context.

An example of a full command installing ironic from a gerrit patch is:

docker build . -f Dockerfile --build-arg INSTALL_TYPE=source --build-arg IRONIC_SOURCE="refs/changes/89/860689/2"

An example using the local directory sources/ironic:

docker build . -f Dockerfile --build-arg INSTALL_TYPE=source --build-arg IRONIC_SOURCE="ironic"

Work with patches in the ironic-image

The ironic-image allows testing patches for ironic projects building the container image directly including any patch using the script at build time. To use the script we need to specify a text file containing the list of patches to be applied as the value of the build argument PATCH_LIST, for example:

docker build . -f Dockerfile --build-arg PATCH_LIST=patch-list.txt

At the moment, only patches coming from gerrit are accepted. Include one patch per line in the PATCH_LIST file with the format:

project refspec


  • project is the last part of the project url including the org, for example openstack/ironic
  • refspec is the gerrit refspec of the patch we want to test, for example refs/changes/67/759567/1

Special resources: sushy-tools and virtualbmc

In the ironic-image container repository, under the resources directory, we find the Dockerfiles needed to build sushy-tools and virtualbmc containers.

They can both be built exactly like the other containers using the docker build command.

Kubernetes Cluster API Provider Metal3

Kubernetes-native declarative infrastructure for Metal3.

What is the Cluster API Provider Metal3

The Cluster API brings declarative, Kubernetes-style APIs to cluster creation, configuration and management. The API itself is shared across multiple cloud providers. Cluster API Provider Metal3 is one of the providers for Cluster API and enables users to deploy a Cluster API based cluster on top of bare metal infrastructure using Metal3.

Compatibility with Cluster API

CAPM3 versionCluster API versionCAPM3 Release

Development Environment

There are multiple ways to setup a development environment:

Getting involved and contributing

Are you interested in contributing to Cluster API Provider Metal3? We, the maintainers and community, would love your suggestions, contributions, and help! Also, the maintainers can be contacted at any time to learn more about how to get involved.

To set up your environment checkout the development environment.

In the interest of getting more new people involved, we tag issues with good first issue. These are typically issues that have smaller scope but are good ways to start to get acquainted with the codebase.

We also encourage ALL active community participants to act as if they are maintainers, even if you don’t have “official” write permissions. This is a community effort, we are here to serve the Kubernetes community. If you have an active interest and you want to get involved, you have real power! Don’t assume that the only people who can get things done around here are the “maintainers”.

We also would love to add more “official” maintainers, so show us what you can do!

All the repositories in the Metal3 project, including the Cluster API Provider Metal3 GitHub repository, use the Kubernetes bot commands. The full list of the commands can be found here. Note that some of them might not be implemented in metal3 CI.


Community resources and contact details can be found here.

Github issues

We use Github issues to keep track of bugs and feature requests. There are two different templates to help ensuring that relevant information is included.


If you think you have found a bug please follow the instructions below.

  • Please spend a small amount of time giving due diligence to the issue tracker. Your issue might be a duplicate.
  • Collect logs from relevant components and make sure to include them in the bug report you are going to open.
  • Remember users might be searching for your issue in the future, so please give it a meaningful title to help others.
  • Feel free to reach out to the metal3 community.

Tracking new features

We also use the issue tracker to track features. If you have an idea for a feature, or think you can help Cluster API Provider Metal3 become even more awesome, then follow the steps below.

  • Open a feature request.
  • Remember users might be searching for your feature request in the future, so please give it a meaningful title to help others.
  • Clearly define the use case, using concrete examples. e.g.: I type this and cluster-api-provider-metal3 does that.
  • Some of our larger features will require proposals. If you would like to include a technical design for your feature please open a feature proposal in metal3-docs using this template.

After the new feature is well understood, and the design agreed upon we can start coding the feature. We would love for you to code it. So please open up a WIP (work in progress) pull request, and happy coding.

Install Cluster-api-provider-metal3

You can either use clusterctl (recommended) to install Metal³ infrastructure provider or kustomize for manual installation. Both methods install provider CRDs, its controllers and Ip-address-manager. Please keep in mind that Baremetal Operator and Ironic are decoupled from CAPM3 and will not be installed when the provider is initialized. As such, you need to install them yourself.


  1. Install clusterctl, refer to Cluster API book for installation instructions.

  2. Install kustomize, refer to official instructions here.

  3. Install Ironic, refer to TODO.

  4. Install Baremetal Operator, refer to TODO.

  5. Install Cluster API core compoenents i.e., core, bootstrap and control-plane providers. This will also install cert-manager, if it is not already installed.

     clusterctl init --core cluster-api:v1.1.4 --bootstrap kubeadm:v1.1.4 \
     --control-plane kubeadm:v1.1.4 -v5

With clusterctl

This method is recommended. You can specify the CAPM3 version you want to install by appending a version tag, e.g. :v1.1.2. If the version is not specified, the latest version available will be installed.

clusterctl init --infrastructure metal3:v1.1.2

With kustomize

To install specific version, edit the controller-manager image version in config/default/capm3/manager_image_patch.yaml

apiVersion: apps/v1
kind: Deployment
  name: controller-manager
  namespace: system
      # Change the value of image/tag to your desired image URL or version tag
      - image:
        name: manager

Apply the manifests

cd cluster-api-provider-metal3
kustomize build config/default | kubectl apply -f -

Remediation Controller and MachineHealthCheck

The Cluster API includes the remediation feature that implements an automated health checking of k8s nodes. It deletes unhealthy Machine and replaces with a healthy one. This approach can be challenging with cloud providers that are using hardware based clusters because of slower (re)provisioning of unhealthy Machines. To overcome this situation, CAPI remediation feature was extended to plug-in provider specific external remediation. It is also possible to plug-in Metal3 specific remediation strategies to remediate unhealthy nodes. In this case, the Cluster API MHC finds unhealthy nodes while the CAPM3 Remediation Controller remediates those unhealthy nodes.

CAPI Remediation

A MachineHealthCheck is a Cluster API resource, which allows users to define conditions under which Machines within a Cluster should be considered unhealthy. Users can also specify a timeout for each of the conditions that they define to check on the Machine’s Node. If any of these conditions are met for the duration of the timeout, the Machine will be remediated. CAPM3 will use the MachineHealthCheck to create remediation requests based on Metal3RemediationTemplate and Metal3Remediation CRDs to plug-in remediation solution. For more info, please read the CAPI MHC link.

External Remediation

External remediation provides remediation solutions other than deleting unhealthy Machine and creating healthy one. Environments consisting of hardware based clusters are slower to (re)provision unhealthy Machines. So there is a growing need for a remediation flow that includes external remediation which can significantly reduce the remediation process time. Normally the conditions based remediation doesn’t offer any other remediation than deleting an unhealthy Machine and replacing it with a new one. Other environments and vendors can also have specific remediation requirements, so there is a need to provide a generic mechanism for implementing custom remediation logic. External remediation integrates with CAPI MHC and support remediation based on power cycling the underlying hardware. It supports the use of BMO reboot API and CAPM3 unhealthy annotation as part of the automated remediation cycle. It is a generic mechanism for supporting externally provided custom remediation strategies. If no value for externalRemediationTemplate is defined for the MachineHealthCheck CR, the condition-based flow is continued. For more info: External Remediation proposal

Metal3 Remediation

The CAPM3 remediation controller reconciles Metal3Remediation objects created by CAPI MachineHealthCheck. It locates a Machine with the same name as the Metal3Remediation object and uses BMO and CAPM3 APIs to remediate associated unhealthy node. The remediation controller supports a reboot strategy specified in the Metal3Remediation CRD and uses the same object to store states of the current remediation cycle. The reboot strategy consists of three steps: power off the Machine, delete the related Node, and power the Machine on again. Deleting the Node indicates that the workloads on the Node are not running anymore, which results in quicker rescheduling and lower downtime of the affected workloads.

Enable remediation for worker nodes

Machines managed by a MachineSet (as identified by the nodepool label) can be remediated. Here is an example MachineHealthCheck and Metal3Remediation for worker nodes:

kind: MachineHealthCheck
  name: worker-healthcheck
  namespace: metal3
  # clusterName is required to associate this MachineHealthCheck with a particular cluster
  clusterName: test1
  # (Optional) maxUnhealthy prevents further remediation if the cluster is already partially unhealthy
  maxUnhealthy: 100%
  # (Optional) nodeStartupTimeout determines how long a MachineHealthCheck should wait for
  # a Node to join the cluster, before considering a Machine unhealthy.
  # Defaults to 10 minutes if not specified.
  # Set to 0 to disable the node startup timeout.
  # Disabling this timeout will prevent a Machine from being considered unhealthy when
  # the Node it created has not yet registered with the cluster. This can be useful when
  # Nodes take a long time to start up or when you only want condition based checks for
  # Machine health.
  nodeStartupTimeout: 0m
  # selector is used to determine which Machines should be health checked
      nodepool: nodepool-0
  # Conditions to check on Nodes for matched Machines, if any condition is matched for the duration of its timeout, the Machine is considered unhealthy
  - type: Ready
    status: Unknown
    timeout: 300s
  - type: Ready
    status: "False"
    timeout: 300s
  remediationTemplate: # added infrastructure reference
    kind: Metal3RemediationTemplate
    name: worker-remediation-request

Metal3RemediationTemplate for worker nodes:

kind: Metal3RemediationTemplate
    name: worker-remediation-request
    namespace: metal3
        type: "Reboot"
        retryLimit: 2
        timeout: 300s

Enable remediation for control plane nodes

Machines managed by a KubeadmControlPlane are remediated according to the KubeadmControlPlane proposal. It is necessary to have at least 2 control plane machines in order to use remediation feature. Control plane nodes are identified by the label. Here is an example MachineHealthCheck and Metal3Remediation for control plane nodes:

kind: MachineHealthCheck
  name: controlplane-healthcheck
  namespace: metal3
  clusterName: test1
  maxUnhealthy: 100%
  nodeStartupTimeout: 0m
    matchLabels: ""
    - type: Ready
      status: Unknown
      timeout: 300s
    - type: Ready
      status: "False"
      timeout: 300s
  remediationTemplate: # added infrastructure reference
    kind: Metal3RemediationTemplate
    name: controlplane-remediation-request

Metal3RemediationTemplate for control plane nodes:

kind: Metal3RemediationTemplate
    name: controlplane-remediation-request
    namespace: metal3
        type: "Reboot"
        retryLimit: 1
        timeout: 300s

Limitations and caveats of Metal3 remediation

  • Machines owned by a MachineSet or a KubeadmControlPlane can be remediated by a MachineHealthCheck

  • If the Node for a Machine is removed from the cluster, CAPI MachineHealthCheck will consider this Machine unhealthy and remediates it immediately

  • If there is no Node joins the cluster for a Machine after the NodeStartupTimeout, the Machine will be remediated

  • If a Machine fails for any reason and the FailureReason is set, the Machine will be remediated immediately

Node Reuse

This feature brings a possibility of re-using the same BaremetalHosts (referred to as a host later) during deprovisioning and provisioning mainly as a part of the rolling upgrade process in the cluster.

Importance of scale-in strategy

The logic behind the reusing of the hosts, solely relies on the scale-in upgrade strategy utilized by Cluster API objects, namely KubeadmControlPlane and MachineDeployment. During the upgrade process of above resources, the machines owned by KubeadmControlPlane or MachineDeployment are removed one-by-one before creating new ones (delete-create method). That way, we can fully ensure that, the intended host is reused when the upgrade is kicked in (picked up on the following provisioning for the new machine being created).

Note: To achieve the desired delete first and create after behavior in above-mentioned Cluster API objects, user has to modify:

  • MaxSurge field in KubeadmControlPlane and set it to 0 with minimum number of 3 control plane machines replicas
  • MaxSurge and MaxUnavailable fields in MachineDeployment set them to 0 & 1 accordingly

On the contrary, if the scale-out strategy is utilized by CAPI objects during the upgrade, usually create-swap-delete method is followed by CAPI objects, where new machine is created first and new host is picked up for that machine, breaking the node reuse logic right at the beginning of the upgrade process.


Metal3MachineTemplate (M3MT) Custom Resource is the object responsible for enabling of the node reuse feature.

kind: Metal3MachineTemplate
  name: test1-controlplane
  namespace: metal3
  nodeReuse: True

There could be two Metal3MachineTemplate objects, one referenced by KubeadmControlPlane for control plane nodes, and the other by MachineDeployment for worker node. Before performing an upgrade, user must set nodeReuse field to true in the desired Metal3MachineTemplate object where hosts targeted to be reused. If left unchanged, by default, nodeReuse field is set to false resulting in no host reusing being performed in the workflow. If you would like to know more about the internals of controller logic, please check the original proposal for the feature here

Once nodeReuse field is set to true, user has to make sure that scale-in feature is enabled as suggested above, and proceed with updating the desired fields in KubeadmControlPlane or MachineDeployment to start a rolling upgrade.

Note: If you are creating a new Metal3MachineTemplate object (for control-plane or worker), rather than using the existing one created while provisioning, please make sure to reference it from the corresponding Cluster API object (KubeadmControlPlane or MachineDeployment). Also keep in mind that, already provisioned Metal3Machines were created from the old Metal3MachineTemplate and they consume existing hosts, meaning even though nodeReuse field is set to true in the new Metal3MachineTemplate, it would have no effect. To use newly Metal3MachineTemplate in the workflow, user has to reprovision the nodes, which should result in using new Metal3MachineTemplate referenced in Cluster API object and Metal3Machine created out of it.

CAPM3 Pivoting

What is pivoting

Cluster API Provider Metal3 (CAPM3) implements support for CAPI’s ‘move/pivoting’ feature.

CAPI Pivoting feature is a process of moving the provider components and declared Cluster API resources from a source management cluster to a target management cluster by using the clusterctl functionality called “move”. More information about the general CAPI “move” functionality can be found here.

In Metal3, pivoting is performed by using the CAPI clusterctl tool provided by Cluster-API project. clusterctl recognizes pivoting as move. During the pivot process clusterctl pauses any reconciliation of CAPI objects and this gets propagated to CAPM3 objects as well. Once all the objects are paused, the objects are created on the other side on the target cluster and deleted from the bootstrap cluster.


  1. It is mandatory to use clusterctl for both the bootstrap and target cluster.

    If the provider components are not installed using clusterctl, it will not be able to identify the objects to move. Initializing the cluster using clusterctl essentially adds the following labels in the CRDs of each related object.

    - ""
    - "<provider-name>"

    So if the clusters are not initialized using clusterctl, all the CRDS of the objects to be moved to target cluster needs to have these labels both in bootstrap cluster and target cluster before performing the move.

    Note: This is not recommended, since the way clusterctl identifies objects to manage might change in the future, so it’s always safe to install CRDs and controllers through the clusterctl init sub-command.

  2. BareMetalHost objects have correct status annotation.

    Since BareMetalHost (BMH) status holds important information regarding the BMH itself, BMH with status has to be moved and it has to be reconstructed with correct status in target cluster before it is being reconciled. This is now done through BMH status annotation in BMO.

  3. Maintain connectivity towards provisioning network.

    Baremetal machines boot over a network with a DHCP server. This requires maintaining a fixed IP end points towards the provisioning network. This is achieved through keepalived. A new container is added namely ironic-endpoint-keepalived in the ironic deployment which maintains the Ironic Endpoint using keepalived. The motivation behind maintaining Ironic Endpoint with Keepalived is to ensure that the Ironic Endpoint IP is also passed onto the target cluster control plane. This also guarantees that once moving is done and the management cluster is taken down, target cluster controlplane can re-claim the Ironic endpoint IP through keepalived. The end goal is to make Ironic endpoint reachable in the target cluster.

  4. BMO is deployed as part of CAPM3.

    If not, it has to be deployed before the clusterctl init and the BMH CRDs need to be labeled accordingly manually. Separate labeling for BMH CRDs is required because since CAPM3 release v0.5.0 BMO/BMH CRDs are not deplopyed as part of CAPM3 deployment anymore. This is a prerequisite for both the management and the target cluster.

  5. Objects should have a proper owner reference chain.

    clusterctl move moves all the objects to the target cluster following the owner reference chain. So, it is necessary to verify that all the desired objects that needs to be moved to the target cluster have a proper owner reference chain.

Important Notes

The following requirements are essential for the move process to run successfully:

  1. The move process should be done when the BMHs are in a steady state. BMHs should not be moved while any operation is on-going i.e. BMH is in provisioning state. This will result in failure since the interaction between IPA and Ironic gets broken and as a result Ironic’s database might not be repopulated and eventually the cluster will end up in an erroneous state. Moreover, the IP of the BMH might change after the move and the DHCP-leases from the management cluster are not moved to target cluster.

  2. Before the move process is initialized, it is important to delete the Ironic pod/Ironic containers. If Ironic is deployed in cluster the deployment is named metal3-ironic, if it is deployed locally outside the cluster then the user has to make sure that all of the ironic related containers are correctly deleted. If Ironic is not deleted before move, the old Ironic might interfere with the operations of the new Ironic deployed in target cluster since the database of the first Ironic instance is not cleaned when the BMHs are moved. Also there would be two dnsmasq existent in the deployment if there would be two Ironic deployment which is undesirable.

  3. The provisioning bridge where the ironic-endpoint-IP is supposed to be attached to should have a static IP assignment on it before the Ironic pod/containers start to operate in the target cluster. This is important since ironic-endpoint-keepalived container will only assign the ironic-endpoint-IP on the provisioning bridge in target cluster when it has an IP on it. Otherwise it will fail to attach the IP and Ironic will be unreachable. This is crucial because this interface is used to host the DHCP server and so it cannot be configured to use DHCP.

Step by step pivoting process

As described in clusterctl the whole process of bootstrapping a management cluster to moving objects to target cluster can be described as follows:

The move process can be bounded with the creation of a temporary bootstrap cluster used to provision a target management cluster.

This can now be achieved with the following procedure:

  1. Create a temporary bootstrap cluster, the temporary bootstrap cluster could be created tools like e.g. using Kind or Minikube using and after the bootstrap cluster is up and running then the CAPI and provider components can be installed with clusterctl to the bootstrap cluster.

  2. Install Ironic components namely: ironic, ironic-inspector, ironic-endpoint-keepalived, httpd and dnsmasq.

  3. Use clusterctl init to install the provider components


    clusterctl init --infrastructure metal3:v1.1.0
    --target-namespace metal3 --watching-namespace metal3

    This command will create the necessary CAPI controllers (CAPI, CABPK, CAKCP) and CAPM3 as the infrastructure provider. All of the controllers will be installed on namespace metal3 and they will be watching over objects in namespace metal3.

  4. Provision target cluster:


    clusterctl config cluster ... | kubectl apply -f -
  5. Wait for the target management cluster to be up and running and once it is up get the kubeconfig for the new target management cluster.

  6. Use the new cluster’s kubeconfig to install the ironic-components in the target cluster.

  7. Use clusterctl init with the new cluster’s kubeconfig to install the provider components.


    clusterctl init --kubeconfig target.yaml --infrastructure metal3:v1.1.0
    --target-namespace metal3 --watching-namespace metal3
  8. Use clusterctl move to move the Cluster API resources from the bootstrap cluster to the target management cluster.


    clusterctl move --to-kubeconfig target.yaml -n metal3 -v 10
  9. Delete the bootstrap cluster

Automated Cleaning

Before reading this page, please see Baremetal Operator Automated Cleaning page.

If you are using only Metal3 Baremetal Operator, you can skip this page and refer to Baremetal Operator automated cleaning page instead.

For deployments following Cluster-api-provider-metal3 (CAPM3) workflow, automated cleaning can be (recommended) configured via CAPM3 custom resources (CR).

There are two automated cleaning modes available which can be set via automatedCleaningMode field of a Metal3MachineTemplate spec or Metal3Machine spec.

  • metadata to enable the cleaning
  • disabled to disable the cleaning

When enabled (metadata), automated cleaning kicks off when a node is in the first provisioning and on every deprovisioning. There is no default value for automatedCleaningMode in Metal3MachineTemplate and Metal3Machine. If user doesn’t set any mode, the field in the spec will be omitted. Unsetting automatedCleaningMode in the Metal3MachineTemplate will block the synchronization of the cleaning mode between the Metal3MachineTemplate and Metal3Machines. This enables the selective operations described below.

Bulk operations

CAPM3 controller ensures to replicate automated cleaning mode to all Metal3Machines from their referenced Metal3MachineTemplate. For example, one controlplane and one worker Metal3Machines have automatedCleaningMode set to disabled, because it is set to disabled in the template that they both are referencing.

Note: CAPM3 controller replicates the cleaning mode from Metal3MachineTemplate to Metal3Machine only if automatedCleaningMode is set (not empty) on the Metal3MachineTemplate resource. In other words, it synchronizes either disabled or metadata modes between Metal3MachineTemplate and Metal3Machines.

Selective operations

Normally automated cleaning mode is replicated from Metal3MachineTemplate spec to its referenced Metal3Machines’ spec and from Metal3Machines spec to BareMetalHost spec (if CAPM3 is used). However, sometimes you might want to have a different automated cleaning mode for one or more Metal3Machines than the others even though they are referencing the same Metal3MachineTemplate. For example, there is one worker and one controlplane Metal3Machine created from the same Metal3MachineTemplate, and we would like the automated cleaning to be enabled (metadata) for the worker while disabled (disabled) for the controlplane.

Here are the steps to achieve that:

  1. Unset automatedCleaningMode in the Metal3MachineTemplate. Then CAPM3 controller unsets it for referenced Metal3Machines. Although it is unset in the Metal3Machine, BareMetalHosts will get their default automated cleaning mode metadata. As we mentioned earlier, CAPM3 controller replicates cleaning mode from Metal3MachineTemplate to Metal3Machine ONLY when it is either metadata or disabled. As such, to block synchronization between Metal3MachineTemplate and Metal3Machine, unsetting the cleaning mode in the Metal3MachineTemplate is enough.
  2. Set automatedCleaningMode to disabled on the worker Metal3Machine spec and to metadata on the controlplane Metal3Machine spec. Since we don’t have any mode set on the Metal3MachineTemplate, Metal3Machines can have different automated cleaning modes set even if they reference the same Metal3MachineTemplate. CAPM3 controller copies cleaning modes from Metal3Machines to their corresponding BareMetalHosts. As such, we end up with two nodes having different cleaning modes regardless of the fact that they reference the same Metal3MachineTemplate.

IPAM (IP Address Manager)

The IPAM project provides a controller to manage static IP address allocations in Cluster API Provider Metal3.

In CAPM3, the Network Data need to be passed to Ironic through the BareMetalHost. CAPI addresses the deployment of Kubernetes clusters and nodes, using the Kubernetes API. As such, it uses objects such as MachineDeployments (similar to deployments for pods) that takes care of creating the requested number of machines, based on templates. The replicas can be increased by the user, triggering the creation of new machines based on the provided templates. Considering the KubeadmControlPlane and MachineDeployment features in Cluster API, it is not possible to provide static IP addresses for each machine before the actual deployments.

In addition, all the resources from the source cluster must support the CAPI pivoting, i.e. being copied and recreated in the target cluster. This means that all objects must contain all needed information in their spec field to recreate the status in the target cluster without losing information. All objects must, through a tree of owner references, be attached to the cluster object, for the pivoting to proceed properly.

Moreover, there are use cases that the users want to specify multiple non-continuous ranges of IP addresses, use the same pool across multiple Template objects, or rule out some IP addresses that might be in use for any reason after the deployment.

The IPAM is introduced to manage the allocations of IP subnet according to the requests without handling any use of those addresses. The IPAM adds the flexibility by providing the address right before provisioning the node. It can share a pool across MachineDeployment or KubeadmControlPlane, allow non-continuous pools and external IP management by using IPAddress CRs, offer predictable IP addresses, and it is resilient to the clusterctl move operation.

In order to use IPAM, both the CAPI and IPAM controllers are required, since the IPAM controller has a dependency on Cluster API Cluster objects.

IPAM components

  • IPPool: A set of IP addresses pools to be used for IP address allocations
  • IPClaim: Request for an IP address allocation
  • IPAddress: IP address allocation


Example of IPPool:

kind: IPPool
  name: pool1
  namespace: default
  clusterName: cluster1
  namePrefix: test1-prov
    - start:
      prefix: 25
    - subnet:
    - subnet:
  prefix: 24

The spec field contains the following fields:

  • clusterName: Name of the cluster to which this pool belongs, it is used to verify whether the resource is paused.
  • namePrefix: The prefix used to generate the IPAddress.
  • pools: List of IP address pools
  • prefix: Default prefix for this IPPool
  • gateway: Default gateway for this IPPool
  • preAllocations: Default preallocated IP address for this IPPool

The prefix and gateway can be overridden per pool. Here is the pool definition:

  • start: IP range start address and it can be omitted if subnet is set.
  • end: IP range end address and can be omitted.
  • subnet: Subnet for the allocation and can be omitted if start is set. It is used to verify that the allocated address belongs to this subnet.
  • prefix: Override of the default prefix for this pool
  • gateway: Override of the default gateway for this pool


An IPClaim is an object representing a request for an IP address allocation.

Example of IPClaim:

kind: IPClaim
  name: test1-controlplane-template-0-pool1
  namespace: default
    name: pool1
    namespace: default

The spec field contains the following:

  • pool: This is a reference to the IPPool that is requested for


An IPAddress is an object representing an IP address allocation. It will be created by IPAM to fill an IPClaim, so that user does not have to create it manually.

Example IPAddress:

kind: IPAddress
  name: test1-prov-192-168-0-13
  namespace: default
    name: pool1
    namespace: default
    name: test1-controlplane-template-0-pool1
    namespace: default
  prefix: 24

The spec field contains the following:

  • pool: Reference to the IPPool this address is for
  • claim: Reference to the IPClaim this address is for
  • address: Allocated IP address
  • prefix: Prefix for this address
  • gateway: Gateway for this address

Installing IPAM as Deployment

This section will show how IPAM can be installed as a deployment in a cluster.

Deploying controllers

CAPI and IPAM controllers need to be deployed at the begining. The IPAM controller has a dependency on Cluster API Cluster objects. CAPI CRDs and controllers must be deployed and the cluster objects should exist for successful deployments.


The user can create the IPPool object independently. It will wait for its cluster to exist before reconciling. If the user wants to create IPAddress objects manually, they should be created before any claims. It is highly recommended to use the preAllocations field itself or have the reconciliation paused.

After an IPClaim object creation, the controller will list all existing IPAddress objects. It will then select randomly an address that has not been allocated yet and is not in the preAllocations map. It will then create an IPAddress object containing the references to the IPPool and IPClaim and the address, the prefix from the address pool or the default prefix, and the gateway from the address pool or the default gateway.

Deploy IPAM

Deploys IPAM CRDs and IPAM controllers. We can run Makefile target from inside the cloned IPAM git repo.

    make deploy

Run locally

Runs IPAM controller locally

    kubectl scale -n capm3-system deployment.v1.apps/metal3-ipam-controller-manager \
      --replicas 0
    make run

Deploy an example pool

    make deploy-examples

Delete the example pool

    make delete-examples


When deleting an IPClaim object, the controller will simply delete the associated IPAddress object. Once all IPAddress objects have been deleted, the IPPool object can be deleted. Before that point, the finalizer in the IPPool object will block the deletion.


  1. IPAM.
  2. IPAM deployment workflow.
  3. Custom resource (CR) examples in metal3-dev-env, in the templates.

Introduction to “Flakes”

The Metal3 community maintains a number of tests and Continous Integration pipelines and other infrastructure elements in order to be able to test, release, package and manage the developement processes of the Metal3 components.

In the Metal3 community’s nomenclature the word “flake” refers to a recurring error that appears from time to time in the CI or during or test process but disappears by itself thus making the reproduction difficult. In order to eventually figure out and fix the root cause of the flakes the community decided to catalog them thus making all developers aware of their existence.

Nordix harbor 504 timeout

Sometimes we get a timeout when pulling images from the Nordix registry. Here is an example:

sudo podman pull
Trying to pull
Error: initializing source docker://
reading manifest latest in received unexpected HTTP status: 504 Gateway Time-out

Occurrence & logs

Download Calico manifests connection failure

Sometimes downloading the calico manifests fail with this error:

TASK [v1aX_integration_test : Download Calico v3.21.x manifests] ***************
task path: /home/****/tested_repo/vm-setup/roles/v1aX_integration_test/tasks/verify.yml:22
fatal: [localhost]: FAILED! => {"changed": false, "dest": "/tmp/", "elapsed": 10, "gid": 0, "group": "root", "mode": "01777", "msg":
"Connection failure: The read operation timed out", "owner": "root", "size": 4096, "state": "directory", "uid": 0, "url":}

Occurrence & logs

Waiting for BMHs to be available again

Sometimes the BMHs get stuck while deprovisioning.

RETRYING: Wait until "2" bmhs become available again. 147/150
RETRYING: Wait until "2" bmhs become available again. 148/150
RETRYING: Wait until "2" bmhs become available again. 149/150
RETRYING: Wait until "2" bmhs become available again. 150/150

When this happens, the following can be seen in the BMO logs:

{"level":"info","ts":1664171732.3624265,"logger":"controllers.BareMetalHost","msg":"Retrying registration","baremetalhost":"metal3/eselda13u31s05","provisioningState":"provisioned"}
{"level":"info","ts":1664171732.3624785,"logger":"controllers.BareMetalHost","msg":"registering and validating access to management controller","baremetalhost":"metal3/eselda13u31s05","provisioningState":"provisioned","credentials":{"credentials":{"name":"bml-ilo-login-secret-05","namespace":"metal3"},"credentialsVersion":"9674"}}
{"level":"info","ts":1664171732.4302292,"logger":"provisioner.ironic","msg":"updating node settings in ironic","host":"metal3~eselda13u31s05"}
{"level":"info","ts":1664171732.5427444,"logger":"provisioner.ironic","msg":"could not update node settings in ironic, busy","host":"metal3~eselda13u31s05"}
{"level":"info","ts":1664171732.5427907,"logger":"controllers.BareMetalHost","msg":"host not ready","baremetalhost":"metal3/eselda13u31s05","provisioningState":"provisioned","wait":10}

And in Ironic we see this:

2022-09-26 05:55:32.499 1 DEBUG ironic.conductor.manager [None req-97e7cdfd-1309-4a01-a42d-a503e221e900 - - - - - -] RPC update_node called for node 67b0163f-faf6-48e3-b5bf-946f37cb48d8. update_node /usr/lib/python3.9/site-packages/ironic/conductor/
2022-09-26 05:55:32.507 1 DEBUG ironic.conductor.task_manager [None req-97e7cdfd-1309-4a01-a42d-a503e221e900 - - - - - -] Attempting to get exclusive lock on node 67b0163f-faf6-48e3-b5bf-946f37cb48d8 (for node update) __init__ /usr/lib/python3.9/site-packages/ironic/conductor/
2022-09-26 05:55:32.516 1 DEBUG ironic.conductor.task_manager [None req-97e7cdfd-1309-4a01-a42d-a503e221e900 - - - - - -] Node 67b0163f-faf6-48e3-b5bf-946f37cb48d8 successfully reserved for node update (took 0.01 seconds) reserve_node /usr/lib/python3.9/site-packages/ironic/conductor/
2022-09-26 05:55:32.531 1 DEBUG ironic.conductor.task_manager [None req-97e7cdfd-1309-4a01-a42d-a503e221e900 - - - - - -] Successfully released exclusive lock for node update on node 67b0163f-faf6-48e3-b5bf-946f37cb48d8 (lock was held 0.01 sec) release_resources /usr/lib/python3.9/site-packages/ironic/conductor/
2022-09-26 05:55:32.534 1 DEBUG ironic.api.method [None req-97e7cdfd-1309-4a01-a42d-a503e221e900 - - - - - -] Client-side error: Node 67b0163f-faf6-48e3-b5bf-946f37cb48d8 is associated with instance 66b3a20d-9c99-4d0b-8f91-5d8ce7bad6f5. format_exception /usr/lib/python3.9/site-packages/ironic/api/
2022-09-26 05:55:32.536 1 INFO eventlet.wsgi.server [None req-97e7cdfd-1309-4a01-a42d-a503e221e900 - - - - - -], "PATCH /v1/nodes/67b0163f-faf6-48e3-b5bf-946f37cb48d8 HTTP/1.1" status: 409  len: 531 time: 0.0919740

The relevant code in Ironic can be found here.

Recreating (deleting) the Ironic Pod seems to help.

Occurrence and logs

CI infrastructure provider unable to attach floating IP

The IAAS provider used by the Metal3 community provides an Openstack based cloud solution that is mainly used by the community to provide virtual machines (VMs). There are 2 distinct regions (both geographic and logical) of the IAAS used by the Metal3 project and in one of the regions (Fra1) the CI needs to attach “floating IPs” to the VMs in order to be usable.

The issue in question was present for at least a day and it was blocking the attachment of “floating IPs” to the newly created VMs thus all the CI jobs that were reliant on the Fra1 region were failing instantly even before the actual CI workload had a chance to run.

Error example:

Running in region: Fra1
The option [tenant_id] has been deprecated. Please avoid using it.
Deleting executer floating IP 7629e843-9e4f-4234-a8f8-053058f850e9.
The option [tenant_id] has been deprecated. Please avoid using it.
Executer floating IP 7629e843-9e4f-4234-a8f8-053058f850e9 is deleted.
usage: openstack floating ip delete [-h] <floating-ip> [<floating-ip> ...]
openstack floating ip delete: error: the following arguments are required: <floating-ip>

Deleting executer VM ci-test-vm-20220720203103-nldz.
Executer VM ci-test-vm-20220720203103-nldz is deleted.
Deleting executer VM port ci-test-vm-20220720203103-nldz-int-port.
The option [tenant_id] has been deprecated. Please avoid using it.
Executer VM port ci-test-vm-20220720203103-nldz-int-port is deleted.

Occurence and logs

  • 20.07-2022 - No logs were possible to collect as the VM was terminated prematurely.

Prow flakes

Multiple PRs stuck in merge queue

At times you will find multiple PR have passed all the tests and also have the required labels approve and lgtm. All the necessary checks have passed but prow cannot merge them. In such cases, it is mostly found that one of the PRs are not rebased properly. You have to put a hold on that PR and the other PR should go in. Once you rebase and push the first PR, that should also go in without any issue. So /hold should do the trick and prow should be able to merge the PR one by one.

Github Workflow changing PRs not getting merged

PRs which are modifying/adding files in .github/workflows/ directory do not get merged automatically by Prow. These PRs tend to get stuck inspite of all the tests passed and all the labels present. We have identified that the reason for this is the branch not being present in upstream (metal3-io) but in origin (Nordix, or other forks). We did not identify the reason behind it. The solution is to push the branch in upstream. Once prow detects the branch is present in upstream, it will be able to merge the local branch. You don’t have to open a PR from upstream branch. Only its existense is enough. However only people who have elevated permissions can push branches upstream for safety reasons. So if you face such issue please ask assistance by emailing

Maintainers guide to tide

Supported release versions

The Cluster API Provider Metal3 (CAPM3) team maintains the following branches for CAPM3 for different API versions.

  • CAPM3
    • main
      • v1beta1
    • release-1.2
      • v1beta1
    • release-1.1
      • v1beta1

Currently, in Metal³ organization only CAPM3 and IPAM follow CAPI release cycles. The supported versions (excluding release candidates) for CAPM3 and IPAM releases are as follows:

  • CAPM3
    • v1beta1
      • v1.2.0, v1.1.3, v1.1.2, v1.1.1, v1.1.0
    • v1alpha5
      • v0.5.5, v0.5.4, v0.5.3, v0.5.2, v0.5.1, v0.5.0
  • IPAM
    • v1alpha1
      • v1.2.0, v1.1.4, v1.1.3, v1.1.2, v1.1.1, v1.1.0, v0.1.2, v0.1.1, v0.1.0

The compatability of IPAM and CAPM3 API versions with CAPI is discussed here.

Since BMO and Ironic do not follow similar release cycles they are backward compatible. However, we used to tag the BMO and Ironic code base whenever we do a release in CAPM3 prior to CAPM3 release v1.1.3. The tags used to have a prefix capm3- and the suffix was always the corresponding capm3- release version. So for example, if we cut a v1.0.0 release for CAPM3 we created a tag in the BMO and Ironic code base with capm3-v1.0.0. Please note, currently that is applicable only for Ironic starting from CAPM3 release v1.1.3 and onwards. Following the same trend, the following tags for BMO and Ironic are available and supported (image tags):

  • capm3-v1.2.0 (Ironic Only)
  • capm3-v1.1.3 (Ironic only)
  • capm3-v1.1.2 (BMO and Ironic)
  • capm3-v1.1.1
  • capm3-v1.1.0
  • capm3-v0.5.5
  • capm3-v0.5.4
  • capm3-v0.5.3
  • capm3-v0.5.2
  • capm3-v0.5.1
  • capm3-v0.5.0

Up until capm3-v1.1.2 tag, BMO follows the same trend as Ironic. However, since capm3-v1.1.2, BMO follows the semantic versioning scheme for its own release cycle, the same way as CAPM3 and IPAM. At the moment, we always cut rolling releases from the main branch of BMO and tag them with the release version (i.e: v0.1.X). Here are available and supported BMO image tags:

  • v0.1.1
  • v0.1.0

Supported Image tags

Supported container images for BMO and Ironic can be found in quay. Examples are:


Supported container images for CAPM3 and IPAM will always follow the supported release version tags and can be found in quay. Examples are:


Metal3-io security policy

This document explains the general security policy for the whole project thus it is applicable for all of its active repositories and this file has to be referenced in each repository in each repository’s SECURITY_CONTACTS file.

Way to report a security issue

The Metal3 Community asks that all suspected vulnerabilities be disclosed by reporting them to mailing list which will forward the vulnerability report to the Metal3 security committee.

Security issue handling, severity categorization, fix process organization

The actions listed below should be completed within 7 days of the security issue’s disclosure on the

Security Lead (SL) of the Metal3 Security Committee (M3SC) is tasked to review the security issue disclosure and give the initial feedback to the reporter as soon as possible. Any disclosed security issue will be visible to all M3SC members.

For each reported vulnerability the SL will work quickly to identify committee members that are able work on a fix and CC those developers into the disclosure thread. These selected developers are the Fix Team. The Fix Team is also allowed to invite additional developers into the disclosure thread based on the repo’s OWNERS file. They will then also become members of the Fix Team but not the M3SC.

M3SC members are encouraged to volunteer to the Fix Teams even before the SL would contact them if they think they are ready to work on the issue. M3SC members are also encouraged to correct both the SL and each other on the disclosure threads even if they have not been selected to the Fix Team but after reading the disclosure thread they were able to find mistakes.

The Fix team will start working on the fix either on a private fork of the affected repo or in the public repo depending on the severity of the issue and the decision of the SL. The SL makes the final call about whether the issue can be fixed publicly or it should stay on a private fork until the fix is disclosed based on the issues’ severity level (discussed later in this document).

The SL and the Fix Team will create a CVSS score using the CVSS Calculator. The SL makes the final call on the calculated risk.

If the CVSS score is under ~4.0 (a low severity score) or the assessed risk is low the Fix Team can decide to slow the release process down in the face of holidays, developer bandwidth, etc. These decisions must be discussed on the

If the CVSS score is under ~7.0 (a medium severity score), the SL may choose to carry out the fix semi-publicly. Semi-publicly means that PRs are made directly in the public Metal3-io repositories, while restricting discussion of the security aspects to private channels. The SL will make the determination whether there would be user harm in handling the fix publicly that outweighs the benefits of open engagement with the community.

If the CVSS score is over ~7.0 (high severity score), fixes will typically receive an out-of-band release.

More information can be found about severity scores here.

Note: CVSS is convenient but imperfect. Ultimately, the SL has discretion on classifying the severity of a vulnerability.

No matter the CVSS score, if the vulnerability requires User Interaction, or otherwise has a straightforward, non-disruptive mitigation, the SL may choose to disclose the vulnerability before a fix is developed if they determine that users would be better off being warned against a specific interaction.

Fix Disclosure Process

With the Fix Development underway the SL needs to come up with an overall communication plan for the wider community. This Disclosure process should begin after the Fix Team has developed a Fix or mitigation so that a realistic timeline can be communicated to users. Emergency releases for critical and high severity issues or fixes for issues already made public may affect the below timelines for how quickly or far in advance notifications will occur.

The SL will lead the process of creating a GitHub security advisory for the repository that is affected by the issue. In case the SL has no administrator privileges the advisory will be created in cooperation with a repository admin. SL will have to request a CVE number for the security advisory. As GitHub is a CVE Numbering authority (CNA) there is an option to either use an existing CVE number or request a new one from GitHub. More about the GitHub security advisory and the CVE numbering process can be found here.

The original reporter(s) of the security issue has to be notified about the release date of the fix and the advisory and about both the content of the fix and the advisory as soon as the SL has decided a date for the fix disclosure.

If a repository that has a release process requires a high severity fix then the fix has to be released as a patch release for all supported release branches where the fix is relevant as soon as possible.

In case the repository does not have a release process, but it needs a critical fix then the fix has to be merged to the main branch as soon as possible.

In repositories that have a release process Medium and Low severity vulnerability fixes will be released as part of the next upcoming minor or major release whichever happens sooner. Simultaneously with the upcoming release the fix also has to be released to all supported release branches as a patch release if the fix is relevant for given release.

In case the fix was developed on a private repository either the SL or someone designated by the SL has to cherry-pick the fix and push it to the public repository. The SL and the Fix Team has to be able to push the PR through the public repo’s review process as soon as possible and merge it.

Metal3 security committee members

NameGitHub IDAffiliation
Dmitry TantsurdtantsurRed Hat
Riccardo PittauelfosardoRed Hat
Zane BitterzanebRed Hat
Furkat Gofurovfurkatgofurov7Ericsson Software Technology
Kashif KhankashifestEricsson Software Technology
Lennart Jernlentzi90Ericsson Software Technology
Tuomo TanskanentuminoidEricsson Software Technology
Adam RozmanRozziiEricsson Software Technology

Please don’t report any security vulnerability to the committee members directly.