Skip to content

Flaky tests since CAPI bump to v1.11 #5769

@chrischdi

Description

@chrischdi

Which jobs are failing?

These are CI flakes we have since merging:

Open Issues:

Example: https://prow.k8s.io/view/gs/kubernetes-ci-logs/logs/periodic-cluster-api-provider-aws-e2e/1993049550197100544

Timed out after 300.000s.
Failed to verify Cluster Available condition for self-hosted-z3si5z/self-hosted-co5oxv
The function passed to Eventually failed at /home/prow/go/pkg/mod/sigs.k8s.io/cluster-api/test@v1.11.1/framework/cluster_helpers.go:457 with:
The Available condition on the Cluster should be set to true; message: * WorkersAvailable:
  * MachineDeployment self-hosted-co5oxv-md-0: 0 available replicas, at least 1 required (spec.strategy.rollout.maxUnavailable is 0, spec.replicas is 1)
Expected
    <v1.ConditionStatus>: False
to equal
    <v1.ConditionStatus>: True
[FAILED] Timed out after 300.000s.
Failed to verify Cluster Available condition for self-hosted-z3si5z/self-hosted-co5oxv
The function passed to Eventually failed at /home/prow/go/pkg/mod/sigs.k8s.io/cluster-api/test@v1.11.1/framework/cluster_helpers.go:457 with:
The Available condition on the Cluster should be set to true; message: * WorkersAvailable:
  * MachineDeployment self-hosted-co5oxv-md-0: 0 available replicas, at least 1 required (spec.strategy.rollout.maxUnavailable is 0, spec.replicas is 1)
Expected
    <v1.ConditionStatus>: False
to equal
    <v1.ConditionStatus>: True
In [It] at: /home/prow/go/pkg/mod/sigs.k8s.io/cluster-api/test@v1.11.1/framework/cluster_helpers.go:463 @ 11/21/25 15:16:21.233

Example: https://prow.k8s.io/view/gs/kubernetes-ci-logs/logs/periodic-cluster-api-provider-aws-e2e/1992687158174945280

Timed out after 2100.001s.
Timed out waiting for all control-plane machines in Cluster self-hosted-8vmoi1/self-hosted-chimom to be upgraded to kubernetes version v1.32.0
The function passed to Eventually returned the following error:
    <*errors.fundamental | 0xc003dc13b0>: 
    old Machines remain
    {
        msg: "old Machines remain",
        stack: [0x3b8ff83, 0x500dc6, 0x4ffed9, 0x8909ff, 0x891aa2, 0x88efc5, 0x3b8fc82, 0x3b84fdf, 0x3c85a68, 0x86cd53, 0x880c7b, 0x484041],
    }
[FAILED] Timed out after 2100.001s.
Timed out waiting for all control-plane machines in Cluster self-hosted-8vmoi1/self-hosted-chimom to be upgraded to kubernetes version v1.32.0
The function passed to Eventually returned the following error:
    <*errors.fundamental | 0xc003dc13b0>: 
    old Machines remain
    {
        msg: "old Machines remain",
        stack: [0x3b8ff83, 0x500dc6, 0x4ffed9, 0x8909ff, 0x891aa2, 0x88efc5, 0x3b8fc82, 0x3b84fdf, 0x3c85a68, 0x86cd53, 0x880c7b, 0x484041],
    }
In [It] at: /home/prow/go/pkg/mod/sigs.k8s.io/cluster-api/test@v1.11.1/framework/machine_helpers.go:181 @ 11/21/25 17:08:29.133

There were additional failures detected after the initial failure. These are visible in the timeline

Which tests are failing?

TODO

Since when has it been failing?

TODO

Testgrid link

No response

Reason for failure (if possible)

No response

Anything else we need to know?

No response

Label(s) to be applied

/kind failing-test
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/deflakeIssues or PRs related to deflaking Cluster API testskind/failing-testCategorizes issue or PR as related to a consistently or frequently failing test.kind/flakeCategorizes issue or PR as related to a flaky test.needs-priorityneeds-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions