Skip to content

Propagate OpenStack instance faults to Machine status when server enters ERROR #2040

@SamuAlfageme

Description

@SamuAlfageme

Current behavior:

When Nova places a server in ERROR before kubelet registers, the corresponding Machine object may keep an empty .status:

k get machine -n kube-system 2xh100nvl-6978bf6c49-ccdtj -oyaml | yq .status
{}

The consumer components (e.g. kkp-api/dashboard) only see a provisioning Machine and the actual provider fault cannot surface unless the user has access to the underlying OpenStack project (not always the case):

os server show f472b2c8-fd17-4759-afaf-ef732776d9cc -c status -c fault
+--------+-------------------------------------------------------------------------------------------------------------------------------+
| Field  | Value                                                                                                                         |
+--------+-------------------------------------------------------------------------------------------------------------------------------+
| fault  | {'code': 500, 'created': '2026-06-04T21:29:31Z', 'message': 'No valid host was found. There are not enough hosts available.'} |
| status | ERROR                                                                                                                         |
+--------+-------------------------------------------------------------------------------------------------------------------------------+

(... in this case, this error is due to a lack of hosts that can schedule such VM flavour - i.e. with 2xh100 GPUs.)

Expected behavior:

If the OpenStack server enters ERROR state, the machine-controller should extract the instance's fault information and persist it to Machine.status.errorReason and errorMessage, even when no Kubernetes Node exists yet - as it happens, for instance with quota exhaustion:

// This generally refers to exceeding one's quota in a cloud provider,
// or running out of physical machines in an on-premise environment.
InsufficientResourcesMachineError MachineStatusError = "InsufficientResources"

The Dashboard can then surface provider-side provisioning failures such as "No valid host was found. There are not enough hosts available." instead of showing an endless provisioning loop:

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions