HYPERFLEET-861 - feat: add Tier 1 E2E tests for update and delete lifecycle#91
HYPERFLEET-861 - feat: add Tier 1 E2E tests for update and delete lifecycle#91kuudori wants to merge 2 commits into
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds a large set of end-to-end Ginkgo/Gomega tests covering cluster and nodepool deletion and update edge cases: re-DELETE idempotency, DELETE non-existent resources, concurrent DELETE behavior, DELETE during active reconciliation, recreation after hard-delete, external Kubernetes resource deletion handling, soft-delete visibility via GET/LIST, labels-only and no-op PATCH behaviors, and rapid update coalescing. Adds two exported constants for sentinel release names used to pause reconciliation. Updates multiple test-case markdown files to mark scenarios as automated and simplifies some existing creation test flows/comments. Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@e2e/cluster/delete_external.go`:
- Around line 60-71: The test currently treats a 404 from GetClusterStatuses as
success by returning early in the Eventually block; change the Eventually
callback so a 404 does not short-circuit and instead fails the assertion and
keeps polling. Specifically, in the Eventually that calls
h.Client.GetClusterStatuses(ctx, clusterID) (the block with GetClusterStatuses,
the two helper.HaveAllAdaptersWithCondition checks and the
Eventually(...).Should(Succeed())), remove the early return on
http.StatusNotFound and let g.Expect(err).NotTo(HaveOccurred()) (or otherwise
call g.Fail via the Gomega `g` handle) handle the 404, so the code waits for
Finalized=True and Health=True while the cluster still exists and leaves the
separate hard-delete check for later.
In `@pkg/helper/constants.go`:
- Around line 14-19: The hardcoded constants SentinelClustersDeployment and
SentinelNodePoolsDeployment make deployment lookup brittle; change the code to
derive sentinel deployment names from test config/Helm values or, better,
resolve the Deployment via stable labels instead of a fixed name: remove or
deprecate those hardcoded constants and update callers (e.g., the
ScaleDeployment flow) to accept a deployment name from the test config/values or
perform a label-based Kubernetes API query (label selector) to find the sentinel
Deployment for clusters and nodepools; ensure the new logic uses the same label
keys used by Helm/Chart metadata so lookups work across releases.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 869800fa-1f50-44ad-ac8c-6202c37be25f
📒 Files selected for processing (8)
e2e/cluster/delete_edge_cases.goe2e/cluster/delete_external.goe2e/cluster/delete_visibility.goe2e/cluster/update_edge_cases.goe2e/nodepool/delete_edge_cases.goe2e/nodepool/delete_visibility.goe2e/nodepool/update_edge_cases.gopkg/helper/constants.go
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@e2e/cluster/delete_visibility.go`:
- Around line 38-43: The cleanup currently only logs errors when restoring the
sentinel deployment (inside ginkgo.DeferCleanup calling h.ScaleDeployment for
helper.SentinelClustersDeployment), which lets failures be ignored; change the
cleanup to fail the spec on error instead of just printing by calling
ginkgo.Fail (or returning an error if using a cleanup that supports error
returns) with a descriptive message that includes the error returned by
h.ScaleDeployment; apply the same change to both DeferCleanup blocks that
restore sentinel (the first one around ginkgo.By("restoring sentinel-clusters to
1 replica") and the second block later in the file) so any failure to scale the
sentinel causes the spec to fail immediately.
In `@e2e/cluster/update_edge_cases.go`:
- Around line 176-189: The current test ginkgo.It("should not increment
generation when PATCHing with identical spec") only replays the exact
canonicalSpec; extend it to also (1) construct and PATCH a
semantically-equivalent spec variant (e.g., reorder list fields, fill omitted
defaults vs. explicit defaults) and assert patchedCluster.Generation ==
clusterBefore.Generation, and (2) send an empty or nil Spec PATCH
(post-empty-spec idempotent replay) then replay the same empty-spec PATCH and
assert Generation does not change; use the same helpers/objects (clusterBefore,
canonicalSpec, h.Client.PatchCluster / openapi.ClusterPatchRequest,
patchedCluster.Generation) so the additional assertions verify
semantic-equivalence and post-empty-spec idempotence, or alternatively narrow
the test name/description to state it only verifies exact-spec replay if you
prefer not to add cases.
In `@e2e/nodepool/delete_visibility.go`:
- Around line 57-62: The cleanup currently only logs errors when restoring
sentinel-nodepools; change it to fail the spec on error. Inside the
ginkgo.DeferCleanup closure that calls h.ScaleDeployment(ctx, h.Cfg.Namespace,
helper.SentinelNodePoolsDeployment, 1), replace the
ginkgo.GinkgoWriter.Printf(...) with a failing call such as
ginkgo.Fail(fmt.Sprintf("failed to restore sentinel-nodepools: %v", err)) (and
add fmt to imports if missing) so any error from h.ScaleDeployment causes the
spec to fail rather than be silently ignored.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 2c907c32-7409-4f89-bd42-83e645e52a4b
📒 Files selected for processing (12)
e2e/cluster/delete_edge_cases.goe2e/cluster/delete_external.goe2e/cluster/delete_visibility.goe2e/cluster/update_edge_cases.goe2e/nodepool/delete_edge_cases.goe2e/nodepool/delete_visibility.goe2e/nodepool/update_edge_cases.gopkg/helper/constants.gotest-design/testcases/delete-cluster.mdtest-design/testcases/delete-nodepool.mdtest-design/testcases/update-cluster.mdtest-design/testcases/update-nodepool.md
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@e2e/cluster/delete_edge_cases.go`:
- Around line 215-237: Before asserting the hard-delete (404), poll the
cluster's adapter status until its observed_generation equals the delete
generation (3) to ensure adapters finalized on the delete generation; implement
this by calling your status fetcher (e.g., use h.Client.GetCluster /
GetClusterStatus in a retry loop or add a helper like PollObservedGeneration)
for clusterID and wait until status.ObservedGeneration == 3 (with the same
timeouts/polling interval used elsewhere) before proceeding to
Eventually(h.PollClusterHTTPStatus(...)).Should(Equal(http.StatusNotFound).
In `@e2e/nodepool/delete_edge_cases.go`:
- Around line 132-146: The test intermittently races with hard-delete; to make
the re-DELETE idempotency check deterministic, pause the nodepool hard-delete
sentinel between the two DeleteNodePool calls: call the helper that pauses the
sentinel (e.g., PauseNodePoolSentinel or equivalent on the test harness) before
the first h.Client.DeleteNodePool(ctx, clusterID, nodepoolID) and resume it
afterwards (use a defer to ensure ResumeNodePoolSentinel is called), then
perform the second delete and validate DeletedTime and Generation against
originalDeletedTime/originalGeneration; ensure both Pause and Resume helpers are
referenced so the sentinel is always resumed even on failures.
In `@e2e/nodepool/update_edge_cases.go`:
- Around line 72-74: The test currently only asserts Reconciled=True after the
PATCH; update the post-PATCH convergence assertion to also require
Available=True by extending the Eventually check that calls h.PollNodePool(ctx,
clusterID, nodepoolID) to assert both
helper.HaveResourceCondition(client.ConditionTypeReconciled,
openapi.ResourceConditionStatusTrue) and
helper.HaveResourceCondition(client.ConditionTypeAvailable,
openapi.ResourceConditionStatusTrue) (either by adding the second Should to the
same Eventually or combining both expectations) so the nodepool converges to the
new generation with Reconciled=True and Available=True.
In `@test-design/testcases/update-cluster.md`:
- Around line 487-508: Add blank lines immediately before and after each fenced
code block in the "Step 2: PATCH with a spec change and verify generation
increments" and "Step 3: Replay the same spec via PATCH" sections (the blocks
starting with ```bash and ending with ```), so the markdown has an empty line
separating the fenced blocks from surrounding paragraphs; ensure both
occurrences (the PATCH command blocks near the "Action:" lines) are surrounded
by a blank line above and below to fix MD031.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 7b465d75-5bbc-4af4-aa3c-f44a938a6a6e
📒 Files selected for processing (14)
e2e/cluster/creation.goe2e/cluster/delete_edge_cases.goe2e/cluster/delete_external.goe2e/cluster/delete_visibility.goe2e/cluster/update_edge_cases.goe2e/nodepool/creation.goe2e/nodepool/delete_edge_cases.goe2e/nodepool/delete_visibility.goe2e/nodepool/update_edge_cases.gopkg/helper/constants.gotest-design/testcases/delete-cluster.mdtest-design/testcases/delete-nodepool.mdtest-design/testcases/update-cluster.mdtest-design/testcases/update-nodepool.md
💤 Files with no reviewable changes (2)
- e2e/cluster/creation.go
- e2e/nodepool/creation.go
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (4)
e2e/cluster/delete_external.go (1)
5-6:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon’t let HTTP 404 satisfy the adapter-finalization assertion
Line 63 returns early on
404, so the test can pass without ever provingFinalized=TrueandHealth=True.Proposed fix
import ( "context" - "errors" "net/http" @@ Eventually(func(g Gomega) { - var httpErr *client.HTTPError statuses, err := h.Client.GetClusterStatuses(ctx, clusterID) - if errors.As(err, &httpErr) && httpErr.StatusCode == http.StatusNotFound { - return - } g.Expect(err).NotTo(HaveOccurred()) g.Expect(statuses).To(helper.HaveAllAdaptersWithCondition( h.Cfg.Adapters.Cluster, client.ConditionTypeFinalized, openapi.AdapterConditionStatusTrue))As per coding guidelines, "Validate changes against HyperFleet architecture standards from the linked architecture repository."
Also applies to: 60-65
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@e2e/cluster/delete_external.go` around lines 5 - 6, The test currently treats an HTTP 404 response as a successful terminal condition (returning early around the adapter-finalization check), which can let the test pass without verifying Finalized=True and Health=True; update the adapter-finalization/health check logic (the HTTP response handling around the 404 return at lines ~60-65 in e2e/cluster/delete_external.go) so that a 404 is not considered success: instead, log/ignore the 404 and continue polling/retrying until you observe Finalized==true and Health==true (or a real error/timeout occurs), guaranteeing the assertion only succeeds when the finalization and health conditions are explicitly met.e2e/nodepool/delete_visibility.go (1)
59-64:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFail the spec if sentinel restoration fails
Line 61 only logs restoration failure. That can leave
sentinel-nodepoolsat0and contaminate subsequent suites.Proposed fix
ginkgo.DeferCleanup(func(ctx context.Context) { ginkgo.By("restoring sentinel-nodepools to 1 replica") - if err := h.ScaleDeployment(ctx, h.Cfg.Namespace, sentinelDeployment, 1); err != nil { - ginkgo.GinkgoWriter.Printf("Warning: failed to restore sentinel-nodepools: %v\n", err) - } + Expect(h.ScaleDeployment(ctx, h.Cfg.Namespace, sentinelDeployment, 1)). + To(Succeed(), "failed to restore sentinel-nodepools") })As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@e2e/nodepool/delete_visibility.go` around lines 59 - 64, The cleanup currently swallows errors when restoring sentinel-nodepools; change the ginkgo.DeferCleanup callback to return an error so the spec fails on restore failure: update the anonymous func passed to ginkgo.DeferCleanup to have signature func(ctx context.Context) error, call h.ScaleDeployment(ctx, h.Cfg.Namespace, sentinelDeployment, 1) and if it returns non-nil wrap/return that error (instead of only logging with ginkgo.GinkgoWriter.Printf) so the test framework records the failure.e2e/cluster/delete_visibility.go (1)
40-45:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDo not swallow sentinel restore failures in cleanup
Both cleanup blocks only warn on restore errors. If
sentinel-clustersstays scaled down, later specs can fail for unrelated reasons.Proposed fix
ginkgo.DeferCleanup(func(ctx context.Context) { ginkgo.By("restoring sentinel-clusters to 1 replica") - if err := h.ScaleDeployment(ctx, h.Cfg.Namespace, sentinelDeployment, 1); err != nil { - ginkgo.GinkgoWriter.Printf("Warning: failed to restore sentinel: %v\n", err) - } + Expect(h.ScaleDeployment(ctx, h.Cfg.Namespace, sentinelDeployment, 1)). + To(Succeed(), "failed to restore sentinel-clusters") })As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security."
Also applies to: 124-129
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@e2e/cluster/delete_visibility.go` around lines 40 - 45, The cleanup currently swallows errors when restoring sentinel by only logging via ginkgo.GinkgoWriter.Printf; update the ginkgo.DeferCleanup closures that call h.ScaleDeployment (references: ginkgo.DeferCleanup, h.ScaleDeployment, sentinelDeployment, ginkgo.GinkgoWriter.Printf) to instead fail the spec when ScaleDeployment returns an error (e.g., call ginkgo.Fail or use gomega.Expect(err).NotTo(HaveOccurred()) with a message containing the error) so restore failures are surfaced and cause the test to fail; make the same change in the second cleanup block that also uses h.ScaleDeployment (lines mentioned in the comment).e2e/nodepool/delete_edge_cases.go (1)
140-145:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFail fast if sentinel restore fails in cleanup
On Line 140, restore failure is only logged. If sentinel stays at 0 replicas, later specs can fail unpredictably. This should hard-fail the spec.
Suggested fix
ginkgo.DeferCleanup(func(ctx context.Context) { ginkgo.By("restoring sentinel-nodepools to 1 replica") - if err := h.ScaleDeployment(ctx, h.Cfg.Namespace, sentinelDeployment, 1); err != nil { - ginkgo.GinkgoWriter.Printf("Warning: failed to restore sentinel-nodepools: %v\n", err) - } + Expect(h.ScaleDeployment(ctx, h.Cfg.Namespace, sentinelDeployment, 1)). + To(Succeed(), "failed to restore sentinel-nodepools to 1 replica") })🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@e2e/nodepool/delete_edge_cases.go` around lines 140 - 145, The cleanup currently only logs failure when restoring sentinel-nodepools, which can leave replicas at 0; change the ginkgo.DeferCleanup anonymous function so it returns an error instead of just printing: call h.ScaleDeployment(ctx, h.Cfg.Namespace, sentinelDeployment, 1) and if it returns an error wrap/return that error (including context like namespace and sentinelDeployment) so Ginkgo marks the cleanup as a failing spec; remove the ginkgo.GinkgoWriter.Printf logging-only path and ensure the cleanup signature returns error so test framework fails fast on restore failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@e2e/cluster/delete_edge_cases.go`:
- Around line 44-49: The cleanup currently swallows errors when restoring
sentinel (inside the ginkgo.DeferCleanup block calling h.ScaleDeployment with
sentinelDeployment); change the handler to fail the test on error instead of
only logging: if h.ScaleDeployment(...) returns a non-nil error, call
ginkgo.Fail (or gomega.Expect(err).NotTo(HaveOccurred())) with a clear message
so restore failures are asserted and cannot be ignored. Ensure the change is
applied inside the existing ginkgo.DeferCleanup anonymous function that wraps
the sentinel restore.
---
Duplicate comments:
In `@e2e/cluster/delete_external.go`:
- Around line 5-6: The test currently treats an HTTP 404 response as a
successful terminal condition (returning early around the adapter-finalization
check), which can let the test pass without verifying Finalized=True and
Health=True; update the adapter-finalization/health check logic (the HTTP
response handling around the 404 return at lines ~60-65 in
e2e/cluster/delete_external.go) so that a 404 is not considered success:
instead, log/ignore the 404 and continue polling/retrying until you observe
Finalized==true and Health==true (or a real error/timeout occurs), guaranteeing
the assertion only succeeds when the finalization and health conditions are
explicitly met.
In `@e2e/cluster/delete_visibility.go`:
- Around line 40-45: The cleanup currently swallows errors when restoring
sentinel by only logging via ginkgo.GinkgoWriter.Printf; update the
ginkgo.DeferCleanup closures that call h.ScaleDeployment (references:
ginkgo.DeferCleanup, h.ScaleDeployment, sentinelDeployment,
ginkgo.GinkgoWriter.Printf) to instead fail the spec when ScaleDeployment
returns an error (e.g., call ginkgo.Fail or use
gomega.Expect(err).NotTo(HaveOccurred()) with a message containing the error) so
restore failures are surfaced and cause the test to fail; make the same change
in the second cleanup block that also uses h.ScaleDeployment (lines mentioned in
the comment).
In `@e2e/nodepool/delete_edge_cases.go`:
- Around line 140-145: The cleanup currently only logs failure when restoring
sentinel-nodepools, which can leave replicas at 0; change the
ginkgo.DeferCleanup anonymous function so it returns an error instead of just
printing: call h.ScaleDeployment(ctx, h.Cfg.Namespace, sentinelDeployment, 1)
and if it returns an error wrap/return that error (including context like
namespace and sentinelDeployment) so Ginkgo marks the cleanup as a failing spec;
remove the ginkgo.GinkgoWriter.Printf logging-only path and ensure the cleanup
signature returns error so test framework fails fast on restore failure.
In `@e2e/nodepool/delete_visibility.go`:
- Around line 59-64: The cleanup currently swallows errors when restoring
sentinel-nodepools; change the ginkgo.DeferCleanup callback to return an error
so the spec fails on restore failure: update the anonymous func passed to
ginkgo.DeferCleanup to have signature func(ctx context.Context) error, call
h.ScaleDeployment(ctx, h.Cfg.Namespace, sentinelDeployment, 1) and if it returns
non-nil wrap/return that error (instead of only logging with
ginkgo.GinkgoWriter.Printf) so the test framework records the failure.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 34f3f21f-e38c-4c43-b46b-fb5e48282ca4
📒 Files selected for processing (14)
e2e/cluster/creation.goe2e/cluster/delete_edge_cases.goe2e/cluster/delete_external.goe2e/cluster/delete_visibility.goe2e/cluster/update_edge_cases.goe2e/nodepool/creation.goe2e/nodepool/delete_edge_cases.goe2e/nodepool/delete_visibility.goe2e/nodepool/update_edge_cases.gopkg/helper/constants.gotest-design/testcases/delete-cluster.mdtest-design/testcases/delete-nodepool.mdtest-design/testcases/update-cluster.mdtest-design/testcases/update-nodepool.md
💤 Files with no reviewable changes (2)
- e2e/cluster/creation.go
- e2e/nodepool/creation.go
Summary
Implement 16 Tier 1 test cases covering edge cases and error scenarios
beyond the critical path for cluster and nodepool update/delete lifecycle.
Test Plan
make test-allpassesmake lintpassesmake test-helm(if applicable)