docs(design): propose multi-node multi-group bootstrap by bootjp · Pull Request #955 · bootjp/elastickv

bootjp · 2026-06-12T06:32:09Z

Summary

Doc-only PR. Proposes multi-node multi-group bootstrap — the prerequisite for true distributed multi-Raft scaling. Today a real multi-node deployment can only run single-group: resolveBootstrapServers rejects --raftBootstrapMembers when len(groups) != 1 (main.go:744-748), groupSpec carries one address per group (shard_config.go:14-17), and buildRuntimeForGroup threads the same (nil-in-multi-group) bootstrapServers into every group (multiraft_runtime.go:234-254), so each group bootstraps single-member and the factory never even builds a transport (factory.go:49-52). This is exactly the PR0 / OQ-9 "option (a)" gap the leader-balance design (#953) identified.

New file: docs/design/2026_06_12_proposed_multinode_multigroup_bootstrap.md (Status: Proposed / Author: bootjp / Date: 2026-06-12). Every current-state claim is cited at file:line, verified against main.

Key design points

Flag surface: new companion --raftGroupPeers "1=n1@host1:5051,n2@host2:5051,...;2=..." (per-group raftID@host:port voter lists), lifting the len(groups)==1 guard. --raftGroups/--raftBootstrapMembers/--raftBootstrap and the cmd/server/demo.go demo are unchanged; mutually exclusive with --raftBootstrapMembers; full validation rules (local-node-present, addr-match, homogeneity).
Bootstrap semantics: all-nodes-same-list (the etcd model cmd/server/demo.go:204-219 already uses for single-group), NOT a designated proposer. Idempotent restart and divergent-config fail-fast come free from the existing persisted-peers + errClusterMismatch + raft-engine marker machinery (already per-group).
Transport/addressing: transport is already per-group and address-map driven; keep one listener per group (the 5005{1..6} convention). No transport change — only feeding per-group lists in.
Alternative considered: AddVoter-composition stays the live-expansion path, not the bootstrap path (per docs(design): propose leader balance scheduler #953 OQ-9).
Cross-doc impact: unblocks leader-balance PR2/PR3 (docs(design): propose leader balance scheduler #953) and hotspot-split M2 cross-group migration testing (docs(design): propose hotspot split M2 — migration plane #945).

Open questions

OQ-1 flag spelling (companion flag vs. extend --raftGroups); OQ-2 --raftBootstrap required vs. implied; OQ-3 per-group carrier (map vs. func); OQ-4 heterogeneous membership (hard error vs. warning in v1); OQ-5 learner-start interop; OQ-6 single multiplexed raft listener for very high M.

Test plan

Doc-only — no code, no tests run. The doc itself proposes the test strategy: unit (flag-parse table tests, per-group resolution, restart idempotency), integration (in-process 3-node × 2-group harness asserting multi-voter Configuration() + inter-node TransferLeadershipToServer), and a later Jepsen milestone (true multi-node multi-group runner).

coderabbitai · 2026-06-12T06:32:17Z

Warning

Review limit reached

@bootjp, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 47 minutes and 37 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more credits in the billing tab to continue.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 744bda98-44ae-4f55-bfc7-93bf7024da8a

📥 Commits

Reviewing files that changed from the base of the PR and between 688ba8a and 252df56.

📒 Files selected for processing (1)

docs/design/2026_06_12_proposed_multinode_multigroup_bootstrap.md

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch design/multinode-multigroup-bootstrap

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

bootjp · 2026-06-12T06:32:19Z

@claude review

claude · 2026-06-12T06:32:31Z

Claude finished @bootjp's task in 2m 37s —— View job

Code Review — `docs/design/2026_06_12_proposed_multinode_multigroup_bootstrap.md`

Gather context and understand request
Read design document
Cross-check source code references
Review design for correctness, completeness, and consistency

Overall: This is a high-quality design proposal. The background analysis is meticulous — every cited file:line was verified against main and is accurate. The problem decomposition (gap → primitives → flag surface → bootstrap semantics → transport → test strategy) is logical and complete. The PR breakdown (A/B/C/D) is well-scoped. The comments below are refinements, not blockers; the doc is ready to land as-is and the open questions are correctly identified.

Findings

1. Data-loss lens: `Bootstrap=true` passed to already-initialized dirs (§3.2)

The doc states "on restart the persisted-peers path takes over regardless, so the bootstrap flag is moot once a dir has state" — but this relies on an implicit invariant in Open() that is never spelled out in the proposal.

Looking at factory.go:43-47, the factory loads persisted peers and substitutes them for cfg.Peers, but still passes the Bootstrap=true flag through to Open() unchanged. The safety claim hinges on Open() detecting an existing WAL and silently ignoring Bootstrap — but that path is not cited. If Open() does not suppress re-bootstrap when a WAL exists and Bootstrap=true is set, a node restart with --raftGroupPeers still present could re-bootstrap over committed data (CLAUDE.md lens 1: data loss).

Recommendation: Add a sentence to §3.2 citing the specific Open() / WAL-detection path that makes Bootstrap=true a no-op on restart (e.g. the HardState/WAL existence check in engine.go). This needs to be verified at PR-B and called out in that PR's five-lens review. Fix this →

2. Implicit invariant: all nodes must have the same `--raftGroupPeers` (not validated at first bootstrap)

The doc's validation rules (§3.1) catch local-node-absent and addr-mismatch, and errClusterMismatch (§3.2) fires on restart. But on a first bootstrap with a fresh dir, if node B has a divergent list from nodes A and C for the same group, the errClusterMismatch guard never fires — each node bootstraps independently with its (mutually inconsistent) peer list, forming separate disconnected "clusters" that can never elect a shared leader or form quorum.

This is not a showstopper (it produces an obvious symptom — no leader elected, no progress), but the failure mode should be briefly acknowledged in §3.2's partial-bootstrap section alongside the other recovery paths. A sentence like "nodes bootstrapped with mismatched lists will form disjoint single-group clusters with no shared quorum; recovery is to wipe all fresh dirs and restart with consistent configs" rounds out the failure taxonomy. Optionally, §6.2's integration test assertions could add a negative case.

3. `buildShardGroups`'s single `bootstrap bool` parameter needs to become per-group (PR-B scope gap)

buildShardGroups takes a single process-wide bootstrap bool and threads it to every group (multiraft_runtime.go:234, main.go:776). The proposal calls for per-group peer lists, but §3.2's "implies bootstrap=true for all groups" decision also means the bootstrap flag needs to become per-group — a group not in --raftGroupPeers should get bootstrap=false (falling back to today's path), while a group that is in --raftGroupPeers gets bootstrap=true.

This is a PR-B scope item not called out explicitly. If PR-B keeps the single bootstrap bool, groups not configured via --raftGroupPeers would inherit the process-wide flag, which may silently bootstrap them as single-member even when --raftBootstrap is false. Recommendation: add an explicit bullet to §6.4 PR-B noting "the process-wide bootstrap bool in buildShardGroups/buildRuntimeForGroup becomes a per-group determination, derived from whether that group has a non-nil resolved peer list."

4. Minor: line reference `main.go:744-748` is slightly imprecise

The doc cites main.go:744-748 for the len(groups) != 1 guard. The guard itself is at lines 746–748; lines 743–745 are the preceding if bootstrapMembers == "" early return. The cited range includes lines that belong to a different condition. This is cosmetic, but for a doc whose stated goal is that "every current-state claim is cited at file:line, verified against main", tightening to main.go:746-748 matches the claim more precisely.

5. Operator experience: a concrete 3-node × 2-group flag example would strengthen §3.1

The grammar specification in §3.1 is clear, but the first thing an operator will want to do is grep for "what do I actually type on node 2 of a 3-node × 2-group cluster." A compact example block — showing the full --raftGroups, --raftGroupPeers, and --raftRedisMap flags for each of the 3 nodes — would make the flag surface immediately graspable without having to mentally assemble it from the grammar. The 5005{1..6} port convention is already documented but lives implicitly in §3.4; surfacing it in a concrete example in §3.1 would reduce the cognitive load when reviewing PR-A.

This is a polish suggestion, not a correctness issue.

Verified accurate

Claim	Verified
`groupSpec` is `id uint64; address string`, `shard_config.go:14-17`	✓
`resolveBootstrapServers` guard at `main.go:746-748`, error at `:736`	✓
`buildRuntimeForGroup` passes single `bootstrapServers` to every group, `multiraft_runtime.go:234`	✓
Transport built only `if len(peers) > 1`, `factory.go:49-52`	✓
`bootstrap = *raftBootstrap \|\| len(bootstrapServers) > 0`, `main.go:534`	✓
`errClusterMismatch` at `engine.go:116`	✓
persisted peers loaded at `factory.go:43-47`	✓
`demo.go:200-219` etcd all-same-list bootstrap + comment	✓
`parseSQSFifoPartitionMap` `;`-separator precedent, `shard_config.go:174-196`	✓
`parseRaftBootstrapMembers` duplicate-id check, `shard_config.go:373-375`	✓

Open questions: recommendations

OQ	Recommendation	Rationale
OQ-1 (companion flag)	Companion `--raftGroupPeers` ✓	Clean separation already well-argued in §3.1
OQ-2 (`--raftBootstrap` implied)	Implied ✓	Consistent with `main.go:534` — non-empty peers already implies bootstrap there
OQ-3 (map vs. func)	`map[uint64][]Server`	Simpler; eager validation at startup is better than deferred
OQ-4 (heterogeneous)	Hard error in v1 ✓	Forward-compatible (grammar already expresses it; relax later with a targeted consumer)
OQ-5 (learner interop)	Voter-only in v1 ✓	Learner bootstrap is a separate concern; mixing at bootstrap adds ordering complexity
OQ-6 (per-group vs. shared listener)	Per-group in v1 ✓	Zero transport change; port count only becomes a problem at M >> typical deployments

Verdict: Approve. The two substantive issues (data-loss-lens gap in §3.2, per-group bootstrap flag scope gap in PR-B) are addressable in implementation PRs and both are within the mandate of "PR carries the five-lens review." The cross-references are exceptionally well-grounded.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 252df56017

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-12T06:33:23Z

+
+We do **not** invent a "lexicographically-smallest peer proposes, others wait-and-join" protocol. That single-proposer pattern is the *AddVoter-composition* path (§5), not the bootstrap path — and adopting it for bootstrap would mean the non-proposer nodes start with an empty conf and must be added one-by-one, which is fragile (ordering, the proposer must be up first and must be leader) and is exactly the "manual AddVoter dance in every test harness" #953 OQ-9 rejected. The all-nodes-same-list model has no designated-proposer ordering requirement: nodes can start in any order, and raft elects a leader once a quorum is up.
+
+**Idempotency on restart (decision: persisted-peers + marker dir already give this — per group).** On first open of a group dir, `Open` writes the normalized peer set to the persisted-peers file (`savePersistedPeers`, `engine.go:643`; format in `peer_metadata.go:205`). On restart, the factory **loads the persisted peers and uses them in preference to the flag-supplied list** (`factory.go:43-47`), and `Open` refuses to start if the configured cluster disagrees with what is persisted (`validateOpenPeers` → `errClusterMismatch`, `engine.go:632`, `:116`). So:


Fix restart mismatch detection before relying on it

For the restart scenario where an operator changes --raftGroupPeers after a group has already written etcd-raft-peers.bin, this design relies on errClusterMismatch to catch the divergent flag list, but the current code discards that list before validation: Factory.Create loads persisted peers and overwrites cfg.Peers, and normalizeOpenConfig does the same before validateOpenPeers runs. As a result, validateOpenPeers compares the snapshot to the persisted list, not the new flag-supplied list, so an implementation following this doc would silently ignore divergent --raftGroupPeers rather than fail fast unless the factory/open path is changed to validate the configured list separately.

Useful? React with 👍 / 👎.

gemini-code-assist

Code Review

This pull request proposes a design for a multi-node multi-group bootstrap mechanism in elastickv, introducing a new --raftGroupPeers flag to configure per-group peer lists at startup. The feedback suggests avoiding the use of a function provider for retrieving bootstrap servers, recommending a static map instead, as the group configuration is static at runtime.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-12T06:33:38Z

+
+### 3.2 Bootstrap semantics
+
+The wiring change is small and local: instead of one process-wide `bootstrapServers` threaded into every group, **resolve a per-group `[]raftengine.Server` and pass each group its own list**. Concretely, `buildShardGroups` / `buildRuntimeForGroup` change from a single `bootstrapServers []raftengine.Server` parameter (`multiraft_runtime.go:234`, `main.go:777`) to a `bootstrapServersFor func(groupID uint64) []raftengine.Server` lookup (or a `map[uint64][]raftengine.Server`), built once from the parsed `--raftGroupPeers`. Everything downstream — the factory's `len(peers) > 1` transport gate (`factory.go:50`), `Open`'s peer normalize/validate/persist (`engine.go:620-643`), the marker dir, the per-group listener — already operates per group and needs no change.


To prevent designing for hypothetical future requirements, we should avoid introducing function providers when the value is static at runtime. Since the set of groups is fixed at startup, a static map[uint64][]raftengine.Server should be used instead of a function provider lookup like func(groupID uint64) []raftengine.Server.

Suggested change

The wiring change is small and local: instead of one process-wide `bootstrapServers` threaded into every group, **resolve a per-group `[]raftengine.Server` and pass each group its own list**. Concretely, `buildShardGroups` / `buildRuntimeForGroup` change from a single `bootstrapServers []raftengine.Server` parameter (`multiraft_runtime.go:234`, `main.go:777`) to a `bootstrapServersFor func(groupID uint64) []raftengine.Server` lookup (or a `map[uint64][]raftengine.Server`), built once from the parsed `--raftGroupPeers`. Everything downstream — the factory's `len(peers) > 1` transport gate (`factory.go:50`), `Open`'s peer normalize/validate/persist (`engine.go:620-643`), the marker dir, the per-group listener — already operates per group and needs no change.

The wiring change is small and local: instead of one process-wide bootstrapServers threaded into every group, resolve a per-group []raftengine.Server and pass each group its own list. Concretely, buildShardGroups / buildRuntimeForGroup change from a single bootstrapServers []raftengine.Server parameter (multiraft_runtime.go:234, main.go:777) to a map[uint64][]raftengine.Server lookup, built once from the parsed --raftGroupPeers. Everything downstream - the factory's len(peers) > 1 transport gate (factory.go:50), Open's peer normalize/validate/persist (engine.go:620-643), the marker dir, the per-group listener - already operates per group and needs no change.

References

Do not design for hypothetical future requirements by introducing function providers or atomic variables when the value is static at runtime. Use a static value instead, and upgrade to dynamic/thread-safe types only when a runtime update path is actually introduced.

gemini-code-assist · 2026-06-12T06:33:38Z

+
+1. **OQ-1 — Flag spelling: `--raftGroupPeers` companion flag vs. extending `--raftGroups` entries.** §3.1 recommends the companion flag (clean separation of "my listener" vs. "the voter set"; mirrors single-group `--raftBootstrapMembers`). Confirm before PR-A, since it fixes the operator-facing surface.
+2. **OQ-2 — Should `--raftBootstrap` be *required* alongside `--raftGroupPeers`, or implied?** §3.2 recommends implied (non-empty `--raftGroupPeers` ⇒ bootstrap=true per group, matching `bootstrap = *raftBootstrap || len(bootstrapServers) > 0`, `main.go:534`). Alternative: require `--raftBootstrap` explicitly for symmetry with single-group. Confirm before PR-B.
+3. **OQ-3 — Per-group bootstrap-server carrier: `func(groupID) []Server` vs. `map[uint64][]Server`.** Implementation detail for threading through `buildShardGroups`/`buildRuntimeForGroup` (`main.go:777`, `multiraft_runtime.go:234`). Map is simpler; func defers construction. Either is fine; pick at PR-B.


Following our architectural guidelines, we should not design for hypothetical future requirements by introducing function providers when the value is static at runtime. Since the group membership is static at runtime, we should resolve OQ-3 by choosing the static map[uint64][]Server approach and removing the function provider option.

Suggested change

3. **OQ-3 — Per-group bootstrap-server carrier: `func(groupID) []Server` vs. `map[uint64][]Server`.** Implementation detail for threading through `buildShardGroups`/`buildRuntimeForGroup` (`main.go:777`, `multiraft_runtime.go:234`). Map is simpler; func defers construction. Either is fine; pick at PR-B.

3. OQ-3 - Per-group bootstrap-server carrier: map[uint64][]Server. Implementation detail for threading through buildShardGroups/buildRuntimeForGroup (main.go:777, multiraft_runtime.go:234). A static map is simpler and avoids introducing a function provider for values that are static at runtime.

References

Do not design for hypothetical future requirements by introducing function providers or atomic variables when the value is static at runtime. Use a static value instead, and upgrade to dynamic/thread-safe types only when a runtime update path is actually introduced.

Issue 1: correct stale line anchors in kv/sharded_coordinator.go - keyviz sampler observeMutation: 1795-1824 -> 1841-1846 - RunHLCLeaseRenewal: 1914-1953 -> 1960-1985; defaultGroup access :1915 -> :1961 Issue 2: soften companion-doc references to in-flight PR #955 form, matching the #945/#951/#953 branch-reference style (3 sites + map line). Issue 3: ground OQ-1 in the actual commitTS logic (nextStartTS/ resolveTxnCommitTS/nextTxnTSAfter all from one c.clock) and annotate §4 step 11 as deferred-pending-OQ-1 with an explicit trigger condition. Inline: fix learner LinearizableRead behaviour (engine returns ErrNotLeader, caller forwards; engine.go:1583); note shared-startTS invariant; add OQ-7 for the single-node->multi-node live cutover / rolling-upgrade strategy.

docs(design): propose multi-node multi-group bootstrap

252df56

chatgpt-codex-connector Bot reviewed Jun 12, 2026

View reviewed changes

gemini-code-assist Bot reviewed Jun 12, 2026

View reviewed changes

bootjp mentioned this pull request Jun 12, 2026

docs(design): propose scaling roadmap #954

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(design): propose multi-node multi-group bootstrap#955

docs(design): propose multi-node multi-group bootstrap#955
bootjp wants to merge 1 commit into
mainfrom
design/multinode-multigroup-bootstrap

bootjp commented Jun 12, 2026

Uh oh!

coderabbitai Bot commented Jun 12, 2026

Review limit reached

Uh oh!

bootjp commented Jun 12, 2026

Uh oh!

claude Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 12, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant


		We do not invent a "lexicographically-smallest peer proposes, others wait-and-join" protocol. That single-proposer pattern is the AddVoter-composition path (§5), not the bootstrap path — and adopting it for bootstrap would mean the non-proposer nodes start with an empty conf and must be added one-by-one, which is fragile (ordering, the proposer must be up first and must be leader) and is exactly the "manual AddVoter dance in every test harness" #953 OQ-9 rejected. The all-nodes-same-list model has no designated-proposer ordering requirement: nodes can start in any order, and raft elects a leader once a quorum is up.

		Idempotency on restart (decision: persisted-peers + marker dir already give this — per group). On first open of a group dir, `Open` writes the normalized peer set to the persisted-peers file (`savePersistedPeers`, `engine.go:643`; format in `peer_metadata.go:205`). On restart, the factory loads the persisted peers and uses them in preference to the flag-supplied list (`factory.go:43-47`), and `Open` refuses to start if the configured cluster disagrees with what is persisted (`validateOpenPeers` → `errClusterMismatch`, `engine.go:632`, `:116`). So:


		### 3.2 Bootstrap semantics

		The wiring change is small and local: instead of one process-wide `bootstrapServers` threaded into every group, resolve a per-group `[]raftengine.Server` and pass each group its own list. Concretely, `buildShardGroups` / `buildRuntimeForGroup` change from a single `bootstrapServers []raftengine.Server` parameter (`multiraft_runtime.go:234`, `main.go:777`) to a `bootstrapServersFor func(groupID uint64) []raftengine.Server` lookup (or a `map[uint64][]raftengine.Server`), built once from the parsed `--raftGroupPeers`. Everything downstream — the factory's `len(peers) > 1` transport gate (`factory.go:50`), `Open`'s peer normalize/validate/persist (`engine.go:620-643`), the marker dir, the per-group listener — already operates per group and needs no change.

	3. OQ-3 — Per-group bootstrap-server carrier: `func(groupID) []Server` vs. `map[uint64][]Server`. Implementation detail for threading through `buildShardGroups`/`buildRuntimeForGroup` (`main.go:777`, `multiraft_runtime.go:234`). Map is simpler; func defers construction. Either is fine; pick at PR-B.
	3. OQ-3 - Per-group bootstrap-server carrier: map[uint64][]Server. Implementation detail for threading through buildShardGroups/buildRuntimeForGroup (main.go:777, multiraft_runtime.go:234). A static map is simpler and avoids introducing a function provider for values that are static at runtime.

Conversation

bootjp commented Jun 12, 2026

Summary

Key design points

Open questions

Test plan

Uh oh!

coderabbitai Bot commented Jun 12, 2026

Review limit reached

Uh oh!

bootjp commented Jun 12, 2026

Uh oh!

claude Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review — docs/design/2026_06_12_proposed_multinode_multigroup_bootstrap.md

Findings

1. Data-loss lens: Bootstrap=true passed to already-initialized dirs (§3.2)

2. Implicit invariant: all nodes must have the same --raftGroupPeers (not validated at first bootstrap)

3. buildShardGroups's single bootstrap bool parameter needs to become per-group (PR-B scope gap)

4. Minor: line reference main.go:744-748 is slightly imprecise

5. Operator experience: a concrete 3-node × 2-group flag example would strengthen §3.1

Verified accurate

Open questions: recommendations

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented Jun 12, 2026 •

edited

Loading

Code Review — `docs/design/2026_06_12_proposed_multinode_multigroup_bootstrap.md`

1. Data-loss lens: `Bootstrap=true` passed to already-initialized dirs (§3.2)

2. Implicit invariant: all nodes must have the same `--raftGroupPeers` (not validated at first bootstrap)

3. `buildShardGroups`'s single `bootstrap bool` parameter needs to become per-group (PR-B scope gap)

4. Minor: line reference `main.go:744-748` is slightly imprecise