Skip to content

docs(design): propose scaling roadmap#954

Open
bootjp wants to merge 2 commits into
mainfrom
design/scaling-roadmap
Open

docs(design): propose scaling roadmap#954
bootjp wants to merge 2 commits into
mainfrom
design/scaling-roadmap

Conversation

@bootjp

@bootjp bootjp commented Jun 12, 2026

Copy link
Copy Markdown
Owner

What

Adds docs/design/2026_06_12_proposed_scaling_roadmap.md — a doc-only scaling roadmap that surveys what exists, what is proposed, and what is missing for elastickv to scale further (more data, more throughput, more nodes).

This is a roadmap / implementation-plan doc, which docs/design/README.md explicitly permits ("Concrete implementation plans"). No code changes.

Structure

  1. Current scaling envelope — the seven things that bound a deployment today, each anchored to a file:line on main.
  2. Dimension-by-dimension analysis — (a) data volume → split M2/M3 + blob offload + shared cache, with range MERGE flagged as a missing design; (b) write throughput → multi-group + leader balance (docs(design): propose leader balance scheduler #953) + the multi-node bootstrap gap; (c) read throughput → lease reads (done), follower-reads (missing design, learner primitive present); (d) cross-group txn → per-group HLC vs centralized TSO assessment; (e) cluster size / membership; (f) operational scaling.
  3. Gap list → recommended new designs ranked by leverage, each with depends-on edges.
  4. Sequencing — dependency-ordered rollout tying in docs(design): propose hotspot split M2 — migration plane #945 / docs(design): propose hotspot split M3 — automation #951 / docs(design): propose leader balance scheduler #953.
  5. Open Questions.

Verified-on-tree claims (selection)

  • RunHLCLeaseRenewal proposes the ceiling to the default group only (kv/sharded_coordinator.go:1915) — the TSO doc's near-term "all led groups" fix is not yet implemented.
  • --raftBootstrapMembers is rejected for multi-group (main.go:746, len(groups) != 1); each group bootstraps single-member (multiraft_runtime.go:246-254) — the multi-node multi-group gap PR docs(design): propose leader balance scheduler #953 §1.1a (PR0) documents.
  • Each Pebble store allocates a private 256 MiB block cache (store/lsm_store.go:273-279); shared-cache is a TODO comment (:117-120), not a design.
  • FSM apply is serial per group (kv/fsm.go:123-124).
  • The raft learner primitive is implemented in code (engine AddLearner/PromoteLearner, raftadmin CLI, --raftJoinAsLearner) even though the doc filename still reads proposed — but follower-served reads (its consumer) have no design.
  • Snapshot disk offload and lease reads are implemented; TSO and range merge are not.

Notes

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@bootjp, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 20 minutes and 14 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more credits in the billing tab to continue.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1747aab4-4c7f-4c75-b092-520ca0bb4c6a

📥 Commits

Reviewing files that changed from the base of the PR and between 688ba8a and 0903148.

📒 Files selected for processing (1)
  • docs/design/2026_06_12_proposed_scaling_roadmap.md
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch design/scaling-roadmap

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@bootjp

bootjp commented Jun 12, 2026

Copy link
Copy Markdown
Owner Author

@claude review

@claude

claude Bot commented Jun 12, 2026

Copy link
Copy Markdown

Claude finished @bootjp's task in 3m 55s —— View job


Claude's Review of PR #954 — Scaling Roadmap Design Doc

Tasks:

  • Load tools
  • Read the scaling roadmap design document
  • Verify code-level claims against the codebase
  • Check consistency with existing design docs
  • Post review findings

Summary

This is a solid doc-only PR with well-structured analysis and honest gap identification. The overall framing (current envelope → dimension analysis → gap list → dependency-ordered sequencing) is clear and the design-doc conventions are followed. Most code-level claims check out, but there are two stale line-number citations, one missing companion document that the text promises exists, and one open question that could materially affect the sequencing.


Issues Found

1. Stale line-number citations in kv/sharded_coordinator.go (correctness)

The two citations into kv/sharded_coordinator.go are off by roughly 46 lines, likely because PR #952 was merged between when the document was written and when it was submitted. The claims themselves are accurate; only the anchors are wrong.

Keyviz sampler (§2(f)):

kv/sharded_coordinator.go:1795-1824

Actual location of observeMutation (the per-route sampler function): lines 1841–1846. Lines 1795–1824 actually contain rawLogs / txnLogs.

RunHLCLeaseRenewal (§1.5 and §2(d)):

kv/sharded_coordinator.go:1914-1953, group, ok := c.groups[c.defaultGroup] at :1915

Actual: the function starts at line 1960, with group, ok := c.groups[c.defaultGroup] at line 1961. Lines 1914–1953 currently contain transaction prepare/commit log-building code.

Since the intro promises "Every claim about current behaviour below is anchored to a file and line on main at the propose date," both should be corrected before merging. Fix this →

2. Referenced companion doc doesn't exist yet (blocker risk)

§2(b) and §3 Gap 1 both say:

"A companion proposal, 2026_06_12_proposed_multinode_multigroup_bootstrap.md, is being written in parallel and is the authoritative spec for this gap; this roadmap defers to it."

That file is not present in docs/design/ on any reachable branch. Readers directed to "the authoritative spec" will find nothing. Two options:

  • Option A (preferred): Land the companion proposal in the same PR (or at least in the same batch) so the "defers to it" pointer resolves.
  • Option B: Soften the language to "a companion proposal will be written" and remove "is the authoritative spec" until it exists.

The companion is also listed as the root unblocking dependency in the sequencing table (step 2), so its absence here is not merely cosmetic.

3. Open Question 1 could pull TSO forward in the sequence (design concern)

§5 OQ-1 asks:

"Is the per-group renewal fix sufficient for cross-group OCC correctness in the multi-node topology, or does the first node-spanning cross-group transaction force the dedicated TSO group earlier than step 11?"

The sequencing in §4 places the dedicated TSO group last (step 11, after cross-group transactions are "common enough"). But kv/transaction.go uses commitTS values that are issued by whichever node leads each group involved in the transaction. Without a single ordering source, two concurrent cross-group transactions led by different nodes can produce non-comparable commit timestamps, breaking OCC serializability.

This means OQ-1 may not be answerable by a "focused correctness review" deferred until just before step 2 lands — the answer could require step 11 to move to step 3. It is worth at least sketching the commitTS comparison logic before accepting the sequencing as final, or explicitly noting in §4 step 11 that the TSO decision should be revisited after multi-node groups are real.


Observations (non-blocking)

Accurate code anchors (spot-checked):
All of the following checked out on main:

  • main.go:746len(groups) != 1 guard ✓
  • multiraft_runtime.go:246-254Peers: bootstrapServers (nil in multi-group), single-member bootstrap ✓
  • store/lsm_store.go:273-279pebble.NewCache(pebbleCacheBytes)
  • store/lsm_store.go:117-120 — shared-cache TODO ✓
  • kv/fsm.go:123-124 and :297 — "Raft applies are serial" comment + Apply definition ✓
  • main.go:426WithLeaseReadObserver wiring ✓
  • main.go:104--raftJoinAsLearner flag ✓
  • main.go:1606-1620 / :1613 — per-group gRPC listener ✓
  • shard_config.go:61-99parseRaftGroups
  • internal/raftengine/engine.go:233,245AddLearner / PromoteLearner interface ✓
  • internal/raftengine/etcd/engine.go:1268,1289,2550 — etcd backend implementations ✓
  • cmd/raftadmin/main.go:204-206add_learner / promote_learner CLI ✓
  • distribution/engine.go RecordAccess — confirmed only in test files, not called from any request path ✓

Gap 5 (range merge) analysis is the strongest new content. The characterization of why merge is harder than split — unifying two independent commit-timestamp streams, the !txn|… keyspace reconciliation hazard, the Composed-1 interaction — is accurate and not documented anywhere else. Worth ensuring this ends up in the eventual merge design.

Sequencing note for #953: The doc correctly distinguishes PR #953 PR1 (observe-only, can land without multi-node) from PR2–PR3 (blocked on Gap 1). This is consistent with #953's own framing.

Gap 2 / Gap 3 are independent of Gap 1 for correctness (shared cache and follower-read design can be written before multi-node bootstrap lands), even though their value depends on Gap 1. The sequencing correctly reflects this.


Verdict

Two issues should be addressed before merging: the stale line-number anchors (issue 1, quick fix) and the missing companion document (issue 2, either land it alongside this PR or soften the language). Issue 3 is a design concern worth discussing before the sequencing is treated as settled. Everything else is accurate.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive scaling roadmap document (2026_06_12_proposed_scaling_roadmap.md) that analyzes the current scaling limits of elastickv and proposes a prioritized sequence of design proposals to address identified gaps. The reviewer's feedback focuses on strengthening the architectural and operational aspects of the roadmap. Specifically, the feedback highlights the need to address live-migration and rolling-upgrade strategies, apply decoupling patterns (like interceptors) for follower/learner reads, use layered mitigations for upgrade risks during sequencing, and explicitly note the requirement for shared start timestamps in cross-group transactions to ensure MVCC consistency.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.


---

## 5. Open Questions

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When designing a production-grade system migration (such as moving from single-node multi-group to multi-node multi-group), the plan must consider and detail strategies for live or zero-downtime cutovers (e.g., blue/green, dual-write proxy, logical replication) to avoid service interruption. Additionally, if there are significant operational risks like the inability to perform rolling upgrades, the document should detail potential mitigation strategies (such as implementing a temporary 'bridge' or 'proxy' mode). Consider adding an open question or a dedicated section to address these live-migration and rolling-upgrade strategies.

References
  1. When designing a production-grade system migration, the plan must consider and detail strategies for live or zero-downtime cutovers (e.g., blue/green, dual-write proxy, logical replication) to avoid service interruption.
  2. When a design document identifies a significant operational risk, such as the inability to perform rolling upgrades, it must also detail potential mitigation strategies, like implementing a temporary 'bridge' or 'proxy' mode.

Comment on lines +301 to +309
### Gap 3 — Follower / learner reads
**Problem.** Reads are leader-only; the learner primitive exists but its
consumer (off-leader reads) has no design. Read throughput cannot scale past
one node's leader capacity. **Rough milestones:** (M1) leader-issued read-ts
pipeline + a replica apply-watermark so a follower/learner can serve a
snapshot read at a leader-vouched timestamp (CLAUDE.md HLC rule). (M2) lease
invalidation interaction + staleness bound + adapter read routing to replicas.
(M3) Jepsen stale-read bound workload. **Depends-on:** Gap 1 (multi-node
groups to have a remote replica); the learner primitive (already in-tree).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When designing the follower/learner reads architecture to decouple low-level generic packages (such as raftengine or store) from high-level domain packages (such as kv or distribution), we should use an interceptor or delegate pattern with interfaces defined in the low-level package. Additionally, ensure that any such interceptor is optional (nil-safe) so that builds or environments with the high-level feature disabled can run without modification.

References
  1. To prevent tight coupling between low-level generic packages and high-level domain packages, use an interceptor or delegate pattern with interfaces defined in the low-level package. Ensure the interceptor is optional (nil-safe) so that builds or environments with the high-level feature disabled can run without modification.


---

## 4. Sequencing (dependency-ordered rollout)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When planning the sequencing and rollout of these major architectural changes, we must mitigate rolling-upgrade risks. It is recommended to use layered mitigations (such as operational guidelines and admin RPC capability gates) and defer complex coordination protocols (like cluster-version Raft entries) to separate design slices if they have a broader scope. This keeps the initial rollout slices focused and manageable.

References
  1. When mitigating rolling-upgrade risks in design documents, use layered mitigations (such as operational guidelines and admin RPC capability gates) and defer complex coordination protocols (like cluster-version Raft entries) to separate design slices if they have a broader scope.

Comment on lines +211 to +213
(TSO doc §6 "Guarantee"). Cross-group transactions (`kv/transaction.go`,
`kv/txn_codec.go`) that span groups led by different nodes need a single
ordering source for OCC commit-ts comparability. So: per-group renewal fix

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When designing cross-group transactions at scale, we must ensure that all operations within a single transaction share the same start timestamp to guarantee atomicity and a consistent snapshot view across the involved groups. This is a core correctness requirement for our MVCC system and should be explicitly highlighted in the transaction scaling analysis.

References
  1. In an MVCC system, all operations within a single transaction must share the same start timestamp to ensure atomicity and a consistent snapshot view.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a1abe3500e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +182 to +184
`--raftJoinAsLearner` exists (`main.go:104`). The learner doc itself scopes
follower-served reads as an explicit non-goal (§2 "Non-goals", §8 OQ-5):
learners forward `LinearizableRead` to the leader today. So the read-replica

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Correct the learner read behavior description

When an attached learner receives LinearizableRead today it does not forward the read to the leader; internal/raftengine/etcd.Engine.handleRead returns ErrNotLeader for any non-leader state, and docs/raft_learner_operations.md documents that the caller must forward to the leader. Leaving this as “learners forward” misstates the current primitive that Gap 3 is supposed to build on, so the follower-read roadmap may omit required forwarding/proxy or API work.

Useful? React with 👍 / 👎.

Issue 1: correct stale line anchors in kv/sharded_coordinator.go
  - keyviz sampler observeMutation: 1795-1824 -> 1841-1846
  - RunHLCLeaseRenewal: 1914-1953 -> 1960-1985; defaultGroup access :1915 -> :1961
Issue 2: soften companion-doc references to in-flight PR #955 form,
  matching the #945/#951/#953 branch-reference style (3 sites + map line).
Issue 3: ground OQ-1 in the actual commitTS logic (nextStartTS/
  resolveTxnCommitTS/nextTxnTSAfter all from one c.clock) and annotate
  §4 step 11 as deferred-pending-OQ-1 with an explicit trigger condition.
Inline: fix learner LinearizableRead behaviour (engine returns ErrNotLeader,
  caller forwards; engine.go:1583); note shared-startTS invariant; add OQ-7
  for the single-node->multi-node live cutover / rolling-upgrade strategy.
@bootjp

bootjp commented Jun 12, 2026

Copy link
Copy Markdown
Owner Author

@claude review

Addressed in 09031487 (doc-only, Status: Proposed unchanged).

Issue 1 — stale line anchors (fixed). Re-verified against current main:

  • keyviz sampler §2(f): observeMutation is at kv/sharded_coordinator.go:1841-1846 (was cited 1795-1824, which now holds rawLogs/txnLogs). Corrected, and added the function name for stability.
  • RunHLCLeaseRenewal §1.5: function is kv/sharded_coordinator.go:1960-1985, with group, ok := c.groups[c.defaultGroup] at :1961 (was 1914-1953 / :1915). Corrected both citations.

The other spot-checked anchors (main.go:746, multiraft_runtime.go:246-254, lsm_store.go, fsm.go, etc.) re-verified accurate; left unchanged.

Issue 2 — companion doc reference (softened, Option B). The companion is PR #955 (design/multinode-multigroup-bootstrap), open and claude-approved but not on main. Softened all three references — §2(b), §3 Gap 1, §4 step 2 plus the in-flight-PR map line — to the same "in review as PR #955 (<file>, branch <branch>)" form already used for #945/#951/#953. "Authoritative spec" is now conditioned on "once PR #955 lands."

Issue 3 — OQ-1 / TSO sequencing (took the substantive route: sketched the commitTS logic AND annotated step 11). Grounded in code:

  • A cross-group txn issues both startTS (nextStartTS, :1429) and commitTS (resolveTxnCommitTSnextTxnTSAfter, :1102/:1376) from a single *HLC (c.clock) on the coordinator's node. maxLatestCommitTS/Observe only floors on the keys the txn touches.
  • Today this is safe only because every group shares one process-wide *HLC. The per-group renewal fix (step 1) buys per-node monotonicity only — TSO doc §6 "Guarantee" is explicit that cross-node monotonicity is not covered.
  • So the moment step 2 makes the two groups of a cross-group txn coordinate on different nodes, two concurrent cross-group txns can draw commitTS from two independently-advanced clocks → non-comparable, breaking OCC. OQ-1 therefore cannot be answered affirmatively by a review deferred to "just before step 2."
  • §4 step 11 is re-labeled "deferred-pending-OQ-1-resolution, not settled-last," with the explicit trigger: the first cross-group transaction whose participating groups are led by different nodes. OQ-1 now also raises an interim option (pin cross-group ts allocation to one designated group's leader) to evaluate in the PR docs(design): propose multi-node multi-group bootstrap #955 design, short of the full TSO group.

Inline findings:

  • codex-connector (P2, learner reads, line 184) — fixed. The doc said "learners forward LinearizableRead to the leader," which is wrong: handleRead returns ErrNotLeader for any non-leader state (internal/raftengine/etcd/engine.go:1583) and the caller must forward (docs/raft_learner_operations.md). Corrected, and noted Gap 3's follower-read design must supply the forwarding/proxy/replica-read API the primitive omits.
  • gemini (line 213, shared start timestamp) — folded in. Added a note in §2(d) that the shared-snapshot invariant (one startTS per txn via nextStartTSreqs.StartTS) is already upheld; the gap is cross-node commitTS comparability, not startTS.
  • gemini (line 399, live cutover / rolling upgrade) — folded in. Added OQ-7 on the single-node→multi-node transition (live AddVoter expansion vs blue/green dual-write proxy vs fresh-cluster migration; version-skew/capability-gating), deferring the spec to PR docs(design): propose multi-node multi-group bootstrap #955 and pointing at the TSO doc §7 phased-cutover structure as the model.
  • gemini (line 309, interceptor/nil-safe pattern) and (line 356, layered mitigations / defer cluster-version Raft entries) — noted, not folded in. These are implementation-pattern suggestions for the eventual Gap 3 follower-read design and the bootstrap rollout respectively; the roadmap explicitly defers mechanism to those docs and says it "does not duplicate the detail of the docs it references." Captured here so the downstream designs pick them up.

@claude

claude Bot commented Jun 12, 2026

Copy link
Copy Markdown

Claude finished @bootjp's task in 1m 58s —— View job


Re-review of PR #954 — Scaling Roadmap (post 09031487)

Tasks:

  • Load tools
  • Read updated design document
  • Verify claimed line anchor fixes against codebase
  • Verify learner-read correction
  • Spot-check OQ-1/TSO annotation and OQ-7 addition
  • Post findings

Verdict: All prior issues resolved — approved to merge

Every item from the previous review has been addressed correctly. Spot-checks below.


Verified Fixes

Issue 1 — Line anchors (fixed, confirmed).

  • observeMutation at kv/sharded_coordinator.go:1841-1846: function body runs exactly lines 1841–1846 ✓
  • RunHLCLeaseRenewal at :1960, group, ok := c.groups[c.defaultGroup] at :1961: exact match ✓
  • New OQ-1 citations (nextStartTS at :1429, resolveTxnCommitTS at :1102, nextTxnTSAfter at :1376) all confirmed ✓

Issue 2 — Companion doc reference (softened, confirmed).
The three references in §2(b), §3 Gap 1, and §4 step 2 now use the same "in review as PR #955 (<file>, branch <branch>); once it lands it is the authoritative spec" form already used for #945/#951/#953. The language is no longer asserting the doc exists before it does. ✓

Issue 3 — OQ-1/TSO sequencing (substantive analysis added, confirmed).
The full code-grounded walkthrough of why the per-group fix is necessary but not sufficient in the cross-node case is accurate:

  • c.clock is the shared *HLC; nextStartTS (:1429) and resolveTxnCommitTS (:1102nextTxnTSAfter at :1376) both issue through it
  • TSO doc §6 "Guarantee" is correctly characterized: per-node monotonicity only, not cross-node
  • Step 11 is now labeled "deferred-pending-OQ-1-resolution, not settled-last" with a concrete trigger event

The interim option (pin cross-group ts allocation to the default group's leader until the full TSO group lands) is a sound suggestion and correctly deferred to the PR #955 design. ✓

Codex P2 — Learner-read description (fixed, confirmed).
The revised text correctly states handleRead returns ErrNotLeader for any non-leader state at internal/raftengine/etcd/engine.go:1583 (line 1583 is the if e.State() != raftengine.StateLeader check that precedes the return), and the docs/raft_learner_operations.md quote ("Serve linearizable reads from the learner. Today LinearizableRead on a learner returns ErrNotLeader and the caller must forward to the leader") is present verbatim in that file. ✓

Gemini shared startTS — folded in, confirmed. The clarification correctly distinguishes the already-upheld startTS invariant (one nextStartTS propagated via reqs.StartTS) from the actual gap (cross-node commitTS comparability). Accurate and clearly stated. ✓

Gemini OQ-7 (live cutover) — folded in, confirmed. The options surveyed (live AddVoter expansion, blue/green with proxy/dualwrite.go, fresh cluster + data migration) are all grounded in in-tree code; the pointer to TSO doc §7 as a structural model for the cutover design is appropriate. ✓

Gemini interceptor/nil-safe and layered mitigations — correctly deferred. These are rightly left to the downstream Gap 3 and bootstrap designs; noting them in the trigger comment is sufficient. ✓


One small observation (non-blocking)

The OQ-1 text at line 429–431 has a minor forward-reference inconsistency: it says nextStartTS allocates one startTS "for the whole txn and propagates it via reqs.StartTS to every participating group" — this is accurate for the RawKV/Transactional gRPC path, but worth a parenthetical caveat that the DynamoDB TransactWriteItems path generates its own timestamps (since the DynamoDB adapter is not behind ShardedCoordinator.nextStartTS). Not a bug in this doc since the doc scope is the coordinator path; mentioning in case it surfaces in a downstream TSO design review.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 09031487bc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +225 to +229
is in-scope-soon and load-bearing; the dedicated TSO group is only justified
once (i) multi-node multi-group is real and (ii) cross-group transactions
are common enough that the batch-allocator amortization pays for the extra
Raft group. Until both hold, the per-group fix plus the shared-`*HLC`
property is adequate. The recommendation in §4 sequences the per-group fix

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Align TSO sequencing with first cross-node txn

This paragraph still frames the dedicated TSO as a throughput/amortization decision and says the per-group fix is adequate until cross-group transactions are common. That conflicts with the later OQ-1 analysis in this same doc: once Gap 1 permits participating groups to be led by different nodes, even the first such cross-group transaction can draw commitTS from incomparable per-node HLCs, so deferring the TSO/interim single-oracle plan based on workload frequency leaves a correctness hole in the roadmap.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant