CNTRLPLANE-3460: Refactor defrag controller to lower impact of defrag#1618
CNTRLPLANE-3460: Refactor defrag controller to lower impact of defrag#1618dusk125 wants to merge 2 commits into
Conversation
|
Skipping CI for Draft Pull Request. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughDefragController's ChangesDefragController single-member refactor and test updates
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes 🚥 Pre-merge checks | ✅ 13 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (13 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/operator/defragcontroller/defragcontroller_test.go`:
- Around line 464-468: The test uses a tiny non-zero defragBudget which can be
flaky; set the DefragController's defragBudget to 0 (or switch the controller to
use a controllable/test clock) so budget exhaustion is deterministic. Locate
where defragBudget is set in the test setup (the variable named defragBudget
used to construct the controller) and replace 1 * time.Nanosecond with 0 (or
adapt the controller constructor to accept a mock clock and use that in the
test) so calling controller.sync(context.TODO(),
factory.NewSyncContext("defrag-controller", eventRecorder)) always observes an
exhausted budget.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: e0de8aac-2c9a-42a2-9dee-693e18c04822
📒 Files selected for processing (2)
pkg/operator/defragcontroller/defragcontroller.gopkg/operator/defragcontroller/defragcontroller_test.go
|
@dusk125: This pull request references CNTRLPLANE-3460 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the sub-task to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
| if len(member.ClientURLs) == 0 { | ||
| // skip unstarted member | ||
| var ( | ||
| etcdMembers []*etcdserverpb.Member |
|
|
||
| if !c.lastDefragTime.IsZero() && time.Since(c.lastDefragTime) < c.defragCooldown { | ||
| klog.V(4).Infof("Defrag cooldown active, %v remaining", c.defragCooldown-time.Since(c.lastDefragTime)) | ||
| return nil |
There was a problem hiding this comment.
we could schedule a next defrag if we know the remaining time here
| return nil | ||
| } | ||
|
|
||
| if !c.lastDefragTime.IsZero() && time.Since(c.lastDefragTime) < c.defragCooldown { |
There was a problem hiding this comment.
Could it be a problem that a member could exceed the fragmentation limit for 1h in high churn cluster, since this backfoff / cooldown is member agnostic?
There was a problem hiding this comment.
hmm maybe we can frame this cooldown differently.... e.g after a defrag, the cluster must have no unhealthy members for at least 10m before defragmenting the next member
| recorder.Eventf("DefragControllerDefragmentSuccess", "etcd member has been defragmented: %s, memberID: %d", member.Name, member.ID) | ||
|
|
||
| // Give cluster time to recover before the next sync defrags another member. | ||
| if err := wait.PollUntilContextTimeout(ctx, pollWaitDuration, pollTimeoutDuration, false, |
There was a problem hiding this comment.
This is not the best pattern for controllers. Could we just select a single member try to defrag it and use a queue for any backoff checks like checking the health?
I suppose the defrag is not that often so a 10m between each member defrag shouldn't be an issue?
There was a problem hiding this comment.
line 144 already has that
if !etcdcli.IsClusterHealthy(memberHealth)
so we can have the backoff work through the sync loop for some period of time
| numDefragFailures int | ||
| defragWaitDuration time.Duration | ||
| defragCooldown time.Duration | ||
| lastDefragTime time.Time |
There was a problem hiding this comment.
you will have to persist this somewhere (operator conditions?), imagine you're restarting CEO within that hour.
| } | ||
| recorder.Eventf("DefragControllerDefragmentSuccess", "etcd member has been defragmented: %s, memberID: %d", member.Name, member.ID) | ||
|
|
||
| // Give cluster time to recover before the next sync defrags another member. |
There was a problem hiding this comment.
if the defrag call came back, then the cluster should be in healthy state and respond. I think the polling is a bit misplaced here.
Refactor the defrag controller to defrag at most one member per sync cycle (the most fragmented), then return. The health check at the start of the next sync ensures the defragged member recovered before we defrag the next one. This replaces the previous approach of defragging all members in a single sync with wait.Poll recovery checks between each. With upstream etcd defrag improvements (openshift/etcd#378) reducing disruption from ~30s to ~100ms, the simpler one-per-sync approach is sufficient. Changes: - Sort members by fragmentation descending, defrag only the first fragmented member found, then return - Remove wait.Poll health recovery loop and related constants - Remove leader-last ordering (unnecessary with fast defrag) - Fix fake Defragment() to update per-member status so multi-sync tests work correctly - Extract setDegraded/clearDegraded helpers Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
de3f881 to
28f0bd1
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pkg/operator/defragcontroller/defragcontroller.go (1)
184-190: ⚡ Quick winError chain is lost by double-formatting.
The error is first formatted with
%vintoerrMsg, then wrapped withfmt.Errorf("%s", errMsg). This loses the original error chain, preventing callers from usingerrors.Is/errors.Asfor inspection.♻️ Suggested fix to preserve error chain
- errMsg := fmt.Sprintf("failed defrag on member: %s, memberID: %x: %v", member.Name, member.ID, err) - recorder.Eventf("DefragControllerDefragmentFailed", errMsg) + recorder.Eventf("DefragControllerDefragmentFailed", "failed defrag on member: %s, memberID: %x: %v", member.Name, member.ID, err) c.numDefragFailures++ if c.numDefragFailures >= maxDefragFailuresBeforeDegrade { c.setDegraded(ctx, recorder) } - return fmt.Errorf("%s", errMsg) + return fmt.Errorf("failed defrag on member: %s, memberID: %x: %w", member.Name, member.ID, err)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/operator/defragcontroller/defragcontroller.go` around lines 184 - 190, The error is being double-formatted which loses the error chain: first with fmt.Sprintf using %v into errMsg, then wrapped with fmt.Errorf("%s", errMsg). This prevents callers from using errors.Is or errors.As for inspection. In the return statement at the end of this block, replace fmt.Errorf("%s", errMsg) with fmt.Errorf that includes the descriptive message prefix and wraps the original err variable using the %w verb instead of %v, so the error chain is preserved while maintaining the same log message for the recorder.Eventf call.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@pkg/operator/defragcontroller/defragcontroller.go`:
- Around line 184-190: The error is being double-formatted which loses the error
chain: first with fmt.Sprintf using %v into errMsg, then wrapped with
fmt.Errorf("%s", errMsg). This prevents callers from using errors.Is or
errors.As for inspection. In the return statement at the end of this block,
replace fmt.Errorf("%s", errMsg) with fmt.Errorf that includes the descriptive
message prefix and wraps the original err variable using the %w verb instead of
%v, so the error chain is preserved while maintaining the same log message for
the recorder.Eventf call.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: f3c9859e-40e4-4363-bee9-a4aff47a4ba3
📒 Files selected for processing (3)
pkg/etcdcli/helpers.gopkg/operator/defragcontroller/defragcontroller.gopkg/operator/defragcontroller/defragcontroller_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/etcdcli/helpers.go
|
With the improvements from openshift/etcd#378, the current implementation should be sufficient to lower the impact of defrag. The changes to etcd can take defrag disruption from 30s to 100ms. The current implementation of the changes to the defrag controller run a single defrag, on the most fragmented member, then check the member health on the next sync (once every 11 minutes) and only continue defragging if all members are healthy. |
0934524 to
c92f5ae
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/operator/defragcontroller/defragcontroller.go (1)
264-268:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winGuard against division by zero when
ondiskis zero.If
DbSizeis zero (unlikely but possible edge case), this function returnsNaN(wheninuseis also 0) or-Inf, which can cause unpredictable behavior in the sorting comparator at lines 168-173.Proposed fix
func checkFragmentationPercentage(ondisk, inuse int64) float64 { + if ondisk <= 0 { + return 0 + } diff := float64(ondisk - inuse) fragmentedPercentage := (diff / float64(ondisk)) * 100 return math.Round(fragmentedPercentage*100) / 100 }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/operator/defragcontroller/defragcontroller.go` around lines 264 - 268, The checkFragmentationPercentage function performs division by ondisk without checking if it is zero, which results in NaN or Inf and causes unpredictable behavior in the sorting comparator. Add a guard clause at the start of the checkFragmentationPercentage function to check if ondisk is less than or equal to zero, and return 0.0 in that case before performing the division calculation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/operator/defragcontroller/defragcontroller.go`:
- Around line 156-158: The error message in the status nil check at the failing
condition does not identify which member had the validation failure. Modify the
fmt.Errorf call to include the member or endpoint identifier (such as a member
name, IP address, or other relevant context variable that should be available in
the surrounding scope) in addition to a descriptive message, so the error
clearly indicates which specific member's endpoint status validation failed.
---
Outside diff comments:
In `@pkg/operator/defragcontroller/defragcontroller.go`:
- Around line 264-268: The checkFragmentationPercentage function performs
division by ondisk without checking if it is zero, which results in NaN or Inf
and causes unpredictable behavior in the sorting comparator. Add a guard clause
at the start of the checkFragmentationPercentage function to check if ondisk is
less than or equal to zero, and return 0.0 in that case before performing the
division calculation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: b38cd00a-ae65-4251-85d7-30bfeb973803
📒 Files selected for processing (1)
pkg/operator/defragcontroller/defragcontroller.go
c92f5ae to
29119b9
Compare
|
/retest-required |
2 similar comments
|
/retest-required |
|
/retest-required |
|
@dusk125: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
wait.Pollrecovery loop and leader-last ordering — with upstream etcd defrag improvements (openshift/etcd#378) reducing disruption from ~30s to ~100ms, the simpler one-per-sync approach is sufficientTest plan
TestNewDefragControllerMultiSyncsvalidates degraded/recovery across sync cycles🤖 Generated with Claude Code
Summary by CodeRabbit