Troubleshooting

Verify that Ironic and Baremetal Operator are healthy

There is no point continuing before you have verified that the controllers are healthy. A “standard” deployment will have Ironic and Baremetal Operator running in the baremetal-operator-system namespace. Check that the containers are running, not restarting or crashing:

kubectl -n baremetal-operator-system get pods

Note: If you deploy Ironic outside of Kubernetes you will need to check on it in a different way.

Healthy example output:

NAME                                                     READY   STATUS    RESTARTS       AGE
baremetal-operator-controller-manager-85b896f688-j27g5   1/1     Running   0              5m13s
ironic-6bcdcb99f8-6ldlz                                  3/3     Running   1 (2m2s ago)   5m15s

(There has been one restart, but it is not constantly restarting.)

Unhealthy example output:

NAME                                                     READY   STATUS    RESTARTS      AGE
baremetal-operator-controller-manager-85b896f688-j27g5   1/1     Running   0             3m35s
ironic-6bcdcb99f8-6ldlz                                  1/3     Running   1 (24s ago)   3m37s

Waiting for IP

Make sure to check the logs also since Ironic may be stuck on “waiting for IP”. For example:

kubectl -n baremetal-operator-system logs ironic-6bcdcb99f8-6ldlz -c ironic

If Ironic is waiting for IP, you need to check the network configuration. Some things to look out for:

What IP or interface is Ironic configured to use?
Is Ironic using the host network?
Is Ironic running on the expected (set of) Node(s)?
Does the Node have the expected IP assigned?
Are you using keepalived or similar to manage the IP, and is it working properly?

Host is stuck in cleaning, how do I delete it?

First and foremost, avoid using forced deletion, otherwise you’ll have a conflict. If you don’t care about disks being cleaned, you can edit the BareMetalHost resource and disable cleaning:

spec:
  automatedCleaningMode: disabled

Alternatively, you can wait for 3 cleaning retries to finish. After that, the host will be deleted. If you do care about cleaning, you need to figure out why it does not finish.

MAC address conflict on registration

If you force deletion of a host after registration, Baremetal Operator will not be able to delete the corresponding record from Ironic. If you try to enroll the same host again, you will see the following error:

Normal  RegistrationError  4m36s  metal3-baremetal-controller  MAC address 11:22:33:44:55:66 conflicts with existing node namespace~name

Currently, the only way to get rid of this error is to re-create the Ironic’s internal database. If your deployment uses SQLite (the default), it is enough to restart the pod with Ironic. If you use MariaDB, you need to restart its pod, clearing any persistent volumes.

Power requests are issued for deleted hosts

Similarly to the previous question, a host is not deleted from Ironic in case of a forced deletion of its BareMetalHost object. If valid BMC credentials were provided, Ironic will keep checking the power state of the host and enforcing the last requested power state. The only solution is again to delete the Ironic’s internal database.