Skip to content

feat(server-ng): add partition reconciliation loop#3436

Merged
numinnex merged 2 commits into
apache:masterfrom
krishvishal:partition-reconciliation-loop
Jun 11, 2026
Merged

feat(server-ng): add partition reconciliation loop#3436
numinnex merged 2 commits into
apache:masterfrom
krishvishal:partition-reconciliation-loop

Conversation

@krishvishal

Copy link
Copy Markdown
Contributor

What

A level-triggered partition reconciliation loop, one task per shard, that converges each shard's local partition set to the committed Streams metadata.

  • Wakes on a post-commit notifier broadcast by shard 0 for partition-shaping ops, a periodic safety tick (default 1s, capped 30s, configurable), and an initial startup pass.
  • Each pass: build missing owned partitions, seed routing rows for peer-owned namespaces, tear down ghosts removed from the committed target.

How

Single-writer via the pump funnel. The reconciler runs off-pump. It stages ReconcileOp::{InsertOwned, InsertRouted, ConfirmRemove, RemoveRouted} and the pump applies them single-writer.

Two-phase tombstone teardown. The reconciler fences writes synchronously (tombstone + shards_table row removal), awaits the disk delete, then enqueues ConfirmRemove. A failed delete stays tombstoned and backed off; the in-memory partition is never dropped while its data still exists.

Epoch / fast-skip model. A monotonic Streams::revision (bumped in the STM on every partition-shaping commit) lets a converged pass skip the O(N) diff when nothing changed. A per-partition created_revision distinguishes a delete +
recreate that reused the same slab-key namespace from the original incarnation, triggering a rebuild instead of serving stale segments.

Failure handling. Per-cause exponential backoff (separate budgets for build vs delete) and recovery for a permanently-wedged tombstone after a failed disk delete.

@github-actions github-actions Bot added the S-waiting-on-review PR is waiting on a reviewer label Jun 8, 2026
@codecov

codecov Bot commented Jun 8, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 75.55824% with 405 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.55%. Comparing base (7bf1a24) to head (7bbfc82).

Files with missing lines Patch % Lines
core/server-ng/src/partition_helpers.rs 61.03% 109 Missing and 11 partials ⚠️
core/server-ng/src/partition_reconciler.rs 88.39% 80 Missing and 6 partials ⚠️
core/server-ng/src/bootstrap.rs 0.00% 82 Missing ⚠️
core/metadata/src/stm/stream.rs 86.97% 28 Missing ⚠️
core/shard/src/lib.rs 75.78% 21 Missing and 2 partials ⚠️
core/partitions/src/iggy_partitions.rs 69.84% 16 Missing and 3 partials ⚠️
core/shard/src/metrics.rs 42.85% 12 Missing ⚠️
core/shard/src/router.rs 0.00% 12 Missing ⚠️
core/shard/src/shards_table.rs 54.54% 10 Missing ⚠️
core/configs/src/server_config/validators.rs 55.55% 4 Missing ⚠️
... and 5 more
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #3436      +/-   ##
============================================
+ Coverage     74.33%   74.55%   +0.22%     
  Complexity      943      943              
============================================
  Files          1247     1249       +2     
  Lines        122232   123567    +1335     
  Branches      98504    99867    +1363     
============================================
+ Hits          90858    92130    +1272     
- Misses        28413    28415       +2     
- Partials       2961     3022      +61     
Components Coverage Δ
Rust Core 75.73% <75.55%> (+0.28%) ⬆️
Java SDK 58.44% <ø> (ø)
C# SDK 69.41% <ø> (-0.52%) ⬇️
Python SDK 81.06% <ø> (ø)
PHP SDK 83.57% <ø> (ø)
Node SDK 91.35% <ø> (+0.14%) ⬆️
Go SDK 40.25% <ø> (ø)
Files with missing lines Coverage Δ
core/configs/src/server_config/sharding.rs 85.44% <100.00%> (+0.37%) ⬆️
core/message_bus/src/lib.rs 89.19% <100.00%> (+0.78%) ⬆️
core/metadata/src/impls/metadata.rs 39.31% <100.00%> (+4.07%) ⬆️
core/metadata/src/stm/snapshot.rs 86.01% <100.00%> (+0.18%) ⬆️
core/server/src/main.rs 63.25% <100.00%> (ø)
core/server/src/shard/system/partitions.rs 78.43% <100.00%> (ø)
.../server_common/src/segment_storage/index_writer.rs 73.97% <100.00%> (+1.11%) ⬆️
...rver_common/src/segment_storage/messages_writer.rs 71.05% <100.00%> (+1.18%) ⬆️
...e/server_common/src/sharding/partition_location.rs 100.00% <100.00%> (ø)
core/simulator/src/replica.rs 100.00% <100.00%> (ø)
... and 15 more

... and 37 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@krishvishal krishvishal force-pushed the partition-reconciliation-loop branch 2 times, most recently from f62c61a to dfe5fb8 Compare June 9, 2026 04:30

@hubcio hubcio left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall all these comments are just nits. gj krishna, despite the size change is very reasonable.

Comment thread core/configs/src/server_config/validators.rs Outdated
Comment thread core/partitions/src/iggy_partition.rs Outdated
Comment thread core/server_common/src/sharding/partition_location.rs Outdated
Comment thread core/server-ng/src/partition_reconciler.rs
Comment thread core/server-ng/src/bootstrap.rs
Comment thread core/message_bus/src/lib.rs Outdated
Comment thread core/shard/Cargo.toml Outdated
@github-actions github-actions Bot added S-waiting-on-author PR is waiting on author response and removed S-waiting-on-review PR is waiting on a reviewer labels Jun 9, 2026
@krishvishal krishvishal force-pushed the partition-reconciliation-loop branch from 68fc78a to 7bbfc82 Compare June 11, 2026 11:10
@krishvishal

Copy link
Copy Markdown
Contributor Author

/ready

@numinnex numinnex merged commit 5159221 into apache:master Jun 11, 2026
91 checks passed
@krishvishal krishvishal deleted the partition-reconciliation-loop branch June 11, 2026 13:06
@github-actions github-actions Bot removed the S-waiting-on-author PR is waiting on author response label Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants