Troubleshooting

Verify that Ironic and Baremetal Operator are healthy

There is no point continuing before you have verified that the controllers are healthy. A “standard” deployment will have Ironic and Baremetal Operator running in the baremetal-operator-system namespace. Check that the containers are running, not restarting or crashing:

kubectl -n baremetal-operator-system get pods

Note: If you deploy Ironic outside of Kubernetes you will need to check on it in a different way.

Healthy example output:

NAME READY STATUS RESTARTS AGE baremetal-operator-controller-manager-85b896f688-j27g5 1/1 Running 0 5m13s ironic-6bcdcb99f8-6ldlz 3/3 Running 1 (2m2s ago) 5m15s

(There has been one restart, but it is not constantly restarting.)

Unhealthy example output:

NAME READY STATUS RESTARTS AGE baremetal-operator-controller-manager-85b896f688-j27g5 1/1 Running 0 3m35s ironic-6bcdcb99f8-6ldlz 1/3 Running 1 (24s ago) 3m37s

Waiting for IP

Make sure to check the logs also since Ironic may be stuck on “waiting for IP”. For example:

kubectl -n baremetal-operator-system logs ironic-6bcdcb99f8-6ldlz -c ironic

If Ironic is waiting for IP, you need to check the network configuration. Some things to look out for:

  • What IP or interface is Ironic configured to use?
  • Is Ironic using the host network?
  • Is Ironic running on the expected (set of) Node(s)?
  • Does the Node have the expected IP assigned?
  • Are you using keepalived or similar to manage the IP, and is it working properly?

Host is stuck in cleaning, how do I delete it?

First and foremost, avoid using forced deletion, otherwise you’ll have a conflict. If you don’t care about disks being cleaned, you can edit the BareMetalHost resource and disable cleaning:

spec: automatedCleaningMode: disabled

Alternatively, you can wait for 3 cleaning retries to finish. After that, the host will be deleted. If you do care about cleaning, you need to figure out why it does not finish.

MAC address conflict on registration

If you force deletion of a host after registration, Baremetal Operator will not be able to delete the corresponding record from Ironic. If you try to enroll the same host again, you will see the following error:

Normal RegistrationError 4m36s metal3-baremetal-controller MAC address 11:22:33:44:55:66 conflicts with existing node namespace~name

Currently, the only way to get rid of this error is to re-create the Ironic’s internal database. If your deployment uses SQLite (the default), it is enough to restart the pod with Ironic. If you use MariaDB, you need to restart its pod, clearing any persistent volumes.