GCP-841: remove ClusterResourceSet feature gate from CAPG manager args#8795
Conversation
ClusterResourceSet was promoted to GA in CAPI 1.10 and removed entirely in CAPI 1.12. OCP 4.22+ ships CAPG built against CAPI 1.12.8, causing the capi-provider pod to crash at startup with: invalid argument "MachinePool=false,ClusterResourceSet=false" for "--feature-gates" flag: unrecognized feature gate: ClusterResourceSet Fixes: GCP-841 Signed-off-by: Cristiano Veiga <cveiga@redhat.com> Commit-Message-Assisted-by: Claude (via Claude Code)
|
Skipping CI for Draft Pull Request. |
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
📝 WalkthroughWalkthroughIn 🚥 Pre-merge checks | ✅ 11✅ Passed checks (11 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: cristianoveiga The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@cristianoveiga: This pull request references GCP-841 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #8795 +/- ##
==========================================
- Coverage 42.09% 42.09% -0.01%
==========================================
Files 766 766
Lines 95047 95043 -4
==========================================
- Hits 40012 40008 -4
Misses 52221 52221
Partials 2814 2814
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
|
/test e2e-v2-gke |
|
@cristianoveiga hypershift is still on CAPI 1.11, since you are removing a feature that is still there on that version we need to make sure it is fine. |
Hi @clebs, The deployed CAPG binary comes from the OCP payload image, built separately from HyperShift's own vendor. My understanding is that these versions are not required to match. The OpenShift CAPG fork upgraded to CAPI 1.12.8 in openshift/cluster-api-provider-gcp@e049bbd, and the new payloads (GCP HCP minimum will be 4.23) ship that binary. ClusterResourceSet doesn't exist in any supported CAPG binary, so the fix is safe. |
|
@cristianoveiga I see, if older CAPG versions that are still on CAPI 1.11 do not have that either, it should work fine. /lgtm |
|
Scheduling tests matching the |
AI Test Failure AnalysisJob: Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6 |
|
/retest-required |
AI Test Failure AnalysisJob: Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6 |
|
@cristianoveiga: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
Here is the complete analysis report: Test Failure Analysis CompleteJob InformationJob 1: e2e-aws
Job 2: e2e-aks
Test Failure AnalysisErrorSummaryBoth job failures are pre-existing flaky tests unrelated to PR #8795. The PR only modifies Root Causee2e-aws — Failure 1: The test patches the management-cluster pull secret and waits for it to propagate to the guest cluster. As part of validation, it waits for the e2e-aws — Failure 2: The e2e-aks — Failure 1: During the control plane upgrade test, the PR Relationship: PR #8795 modifies only Root Cause — Detail per Teste2e-aws:
|
| Evidence | Detail |
|---|---|
| PR scope | Only modifies hypershift-operator/controllers/hostedcluster/internal/platform/gcp/gcp.go — removes conditional ClusterResourceSet=false feature gate |
| PR platform | GCP only — no AWS or AKS code paths affected |
| e2e-aws failure 1 | global-pull-secret-syncer DaemonSet stuck at 2/3 ready pods for 20+ minutes → context deadline exceeded |
| e2e-aws failure 1 cascade | kubelet-config-verifier DaemonSet 409 Conflict ("already exists") due to leftover from prior subtest |
| e2e-aws failure 2 | packageserver pods restartCount > 0 (pod packageserver-9ccbbfb8c-c2d75: 2 restarts, pod packageserver-9ccbbfb8c-lqpcg: 1 restart) |
| e2e-aks failure 1 | openshift-apiserver pod in Failed phase → cannot exec into a container in a completed pod → 300s timeout |
| e2e-aks cluster health | HostedCluster rollout succeeded (4m6s), nodes ready (6m24s), conditions valid — failure was transient pod lifecycle timing |
| e2e-aws step | e2e-aws-hypershift-aws-run-e2e-nested failed after 1h9m2s |
| e2e-aks step | e2e-aks-hypershift-azure-run-e2e failed after 1h11m26s |
| e2e-aws test count | 597 tests, 30 skipped, 8 failures (2 root + 6 cascading parent failures) |
| e2e-aks test count | 402 tests, 47 skipped, 3 failures (1 root + 2 cascading parent failures) |
Summary
ClusterResourceSet=falsefrom the--feature-gatesarg passed to the CAPG managerClusterResourceSetwas promoted to GA in CAPI 1.10 and removed in CAPI 1.12 (kubernetes-sigs/cluster-api#12950)capi-providerpod to crash at startup with:unrecognized feature gate: ClusterResourceSetMachinePool=falseis retained — still valid in CAPI 1.12 (Beta, default-on)Fixes: https://redhat.atlassian.net/browse/GCP-841
Test plan
go test ./hypershift-operator/controllers/hostedcluster/internal/platform/gcp/)periodic-ci-openshift-hypershift-release-4.23-periodics-e2e-v2-gkeno longer fails due to capi-provider crashcapi-providerpod starts successfully on 4.22.x and 4.23.x without a CAPG image override🤖 Generated with Claude Code
Summary by CodeRabbit