docs(design): propose scaling roadmap#954
Conversation
|
Warning Review limit reached
More reviews will be available in 20 minutes and 14 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more credits in the billing tab to continue. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@claude review |
|
Claude finished @bootjp's task in 3m 55s —— View job Claude's Review of PR #954 — Scaling Roadmap Design DocTasks:
SummaryThis is a solid doc-only PR with well-structured analysis and honest gap identification. The overall framing (current envelope → dimension analysis → gap list → dependency-ordered sequencing) is clear and the design-doc conventions are followed. Most code-level claims check out, but there are two stale line-number citations, one missing companion document that the text promises exists, and one open question that could materially affect the sequencing. Issues Found1. Stale line-number citations in
|
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive scaling roadmap document (2026_06_12_proposed_scaling_roadmap.md) that analyzes the current scaling limits of elastickv and proposes a prioritized sequence of design proposals to address identified gaps. The reviewer's feedback focuses on strengthening the architectural and operational aspects of the roadmap. Specifically, the feedback highlights the need to address live-migration and rolling-upgrade strategies, apply decoupling patterns (like interceptors) for follower/learner reads, use layered mitigations for upgrade risks during sequencing, and explicitly note the requirement for shared start timestamps in cross-group transactions to ensure MVCC consistency.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
|
||
| --- | ||
|
|
||
| ## 5. Open Questions |
There was a problem hiding this comment.
When designing a production-grade system migration (such as moving from single-node multi-group to multi-node multi-group), the plan must consider and detail strategies for live or zero-downtime cutovers (e.g., blue/green, dual-write proxy, logical replication) to avoid service interruption. Additionally, if there are significant operational risks like the inability to perform rolling upgrades, the document should detail potential mitigation strategies (such as implementing a temporary 'bridge' or 'proxy' mode). Consider adding an open question or a dedicated section to address these live-migration and rolling-upgrade strategies.
References
- When designing a production-grade system migration, the plan must consider and detail strategies for live or zero-downtime cutovers (e.g., blue/green, dual-write proxy, logical replication) to avoid service interruption.
- When a design document identifies a significant operational risk, such as the inability to perform rolling upgrades, it must also detail potential mitigation strategies, like implementing a temporary 'bridge' or 'proxy' mode.
| ### Gap 3 — Follower / learner reads | ||
| **Problem.** Reads are leader-only; the learner primitive exists but its | ||
| consumer (off-leader reads) has no design. Read throughput cannot scale past | ||
| one node's leader capacity. **Rough milestones:** (M1) leader-issued read-ts | ||
| pipeline + a replica apply-watermark so a follower/learner can serve a | ||
| snapshot read at a leader-vouched timestamp (CLAUDE.md HLC rule). (M2) lease | ||
| invalidation interaction + staleness bound + adapter read routing to replicas. | ||
| (M3) Jepsen stale-read bound workload. **Depends-on:** Gap 1 (multi-node | ||
| groups to have a remote replica); the learner primitive (already in-tree). |
There was a problem hiding this comment.
When designing the follower/learner reads architecture to decouple low-level generic packages (such as raftengine or store) from high-level domain packages (such as kv or distribution), we should use an interceptor or delegate pattern with interfaces defined in the low-level package. Additionally, ensure that any such interceptor is optional (nil-safe) so that builds or environments with the high-level feature disabled can run without modification.
References
- To prevent tight coupling between low-level generic packages and high-level domain packages, use an interceptor or delegate pattern with interfaces defined in the low-level package. Ensure the interceptor is optional (nil-safe) so that builds or environments with the high-level feature disabled can run without modification.
|
|
||
| --- | ||
|
|
||
| ## 4. Sequencing (dependency-ordered rollout) |
There was a problem hiding this comment.
When planning the sequencing and rollout of these major architectural changes, we must mitigate rolling-upgrade risks. It is recommended to use layered mitigations (such as operational guidelines and admin RPC capability gates) and defer complex coordination protocols (like cluster-version Raft entries) to separate design slices if they have a broader scope. This keeps the initial rollout slices focused and manageable.
References
- When mitigating rolling-upgrade risks in design documents, use layered mitigations (such as operational guidelines and admin RPC capability gates) and defer complex coordination protocols (like cluster-version Raft entries) to separate design slices if they have a broader scope.
| (TSO doc §6 "Guarantee"). Cross-group transactions (`kv/transaction.go`, | ||
| `kv/txn_codec.go`) that span groups led by different nodes need a single | ||
| ordering source for OCC commit-ts comparability. So: per-group renewal fix |
There was a problem hiding this comment.
When designing cross-group transactions at scale, we must ensure that all operations within a single transaction share the same start timestamp to guarantee atomicity and a consistent snapshot view across the involved groups. This is a core correctness requirement for our MVCC system and should be explicitly highlighted in the transaction scaling analysis.
References
- In an MVCC system, all operations within a single transaction must share the same start timestamp to ensure atomicity and a consistent snapshot view.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a1abe3500e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| `--raftJoinAsLearner` exists (`main.go:104`). The learner doc itself scopes | ||
| follower-served reads as an explicit non-goal (§2 "Non-goals", §8 OQ-5): | ||
| learners forward `LinearizableRead` to the leader today. So the read-replica |
There was a problem hiding this comment.
Correct the learner read behavior description
When an attached learner receives LinearizableRead today it does not forward the read to the leader; internal/raftengine/etcd.Engine.handleRead returns ErrNotLeader for any non-leader state, and docs/raft_learner_operations.md documents that the caller must forward to the leader. Leaving this as “learners forward” misstates the current primitive that Gap 3 is supposed to build on, so the follower-read roadmap may omit required forwarding/proxy or API work.
Useful? React with 👍 / 👎.
Issue 1: correct stale line anchors in kv/sharded_coordinator.go - keyviz sampler observeMutation: 1795-1824 -> 1841-1846 - RunHLCLeaseRenewal: 1914-1953 -> 1960-1985; defaultGroup access :1915 -> :1961 Issue 2: soften companion-doc references to in-flight PR #955 form, matching the #945/#951/#953 branch-reference style (3 sites + map line). Issue 3: ground OQ-1 in the actual commitTS logic (nextStartTS/ resolveTxnCommitTS/nextTxnTSAfter all from one c.clock) and annotate §4 step 11 as deferred-pending-OQ-1 with an explicit trigger condition. Inline: fix learner LinearizableRead behaviour (engine returns ErrNotLeader, caller forwards; engine.go:1583); note shared-startTS invariant; add OQ-7 for the single-node->multi-node live cutover / rolling-upgrade strategy.
|
@claude review Addressed in Issue 1 — stale line anchors (fixed). Re-verified against current
The other spot-checked anchors ( Issue 2 — companion doc reference (softened, Option B). The companion is PR #955 ( Issue 3 — OQ-1 / TSO sequencing (took the substantive route: sketched the commitTS logic AND annotated step 11). Grounded in code:
Inline findings:
|
|
Claude finished @bootjp's task in 1m 58s —— View job Re-review of PR #954 — Scaling Roadmap (post
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 09031487bc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| is in-scope-soon and load-bearing; the dedicated TSO group is only justified | ||
| once (i) multi-node multi-group is real and (ii) cross-group transactions | ||
| are common enough that the batch-allocator amortization pays for the extra | ||
| Raft group. Until both hold, the per-group fix plus the shared-`*HLC` | ||
| property is adequate. The recommendation in §4 sequences the per-group fix |
There was a problem hiding this comment.
Align TSO sequencing with first cross-node txn
This paragraph still frames the dedicated TSO as a throughput/amortization decision and says the per-group fix is adequate until cross-group transactions are common. That conflicts with the later OQ-1 analysis in this same doc: once Gap 1 permits participating groups to be led by different nodes, even the first such cross-group transaction can draw commitTS from incomparable per-node HLCs, so deferring the TSO/interim single-oracle plan based on workload frequency leaves a correctness hole in the roadmap.
Useful? React with 👍 / 👎.
What
Adds
docs/design/2026_06_12_proposed_scaling_roadmap.md— a doc-only scaling roadmap that surveys what exists, what is proposed, and what is missing for elastickv to scale further (more data, more throughput, more nodes).This is a roadmap / implementation-plan doc, which
docs/design/README.mdexplicitly permits ("Concrete implementation plans"). No code changes.Structure
main.Verified-on-tree claims (selection)
RunHLCLeaseRenewalproposes the ceiling to the default group only (kv/sharded_coordinator.go:1915) — the TSO doc's near-term "all led groups" fix is not yet implemented.--raftBootstrapMembersis rejected for multi-group (main.go:746,len(groups) != 1); each group bootstraps single-member (multiraft_runtime.go:246-254) — the multi-node multi-group gap PR docs(design): propose leader balance scheduler #953 §1.1a (PR0) documents.store/lsm_store.go:273-279); shared-cache is a TODO comment (:117-120), not a design.kv/fsm.go:123-124).AddLearner/PromoteLearner, raftadmin CLI,--raftJoinAsLearner) even though the doc filename still readsproposed— but follower-served reads (its consumer) have no design.Notes
2026_06_12_proposed_multinode_multigroup_bootstrap.md, by name (it is the authoritative spec for Gap 1).docs/hotspot-split-m2-proposal, docs(design): propose hotspot split M3 — automation #951design/hotspot-split-m3-automation, docs(design): propose leader balance scheduler #953design/leader-balance-scheduler).