Phase 1.5: Stage A federated master - leader-leased idempotent billing, join-as-master v0, snapshots, sweep#47
Open
ehsan6sha wants to merge 9 commits into
Open
Phase 1.5: Stage A federated master - leader-leased idempotent billing, join-as-master v0, snapshots, sweep#47ehsan6sha wants to merge 9 commits into
ehsan6sha wants to merge 9 commits into
Conversation
…c deposit credit, cron leader-lease Federated masters (Phase 1.5, Stage A) groundwork. All flag-gated, default OFF - single-master behavior is byte-identical when dark: - migration 018: partial UNIQUE index (user_id, reference_id) WHERE tx_type=hourly_deduction (CONCURRENTLY; pre-checks duplicates; .down.sql) - deductionJob (BILLING_IDEMPOTENCY=true): deterministic reference_id hour:YYYY-MM-DDTHH (UTC) and the history INSERT becomes the dedup gate (ON CONFLICT DO NOTHING) BEFORE the balance update - N masters deduct exactly once per (user, hour) - blockScanner (BILLING_IDEMPOTENCY=true): deposit insert + creditUserTx + claimed_at in ONE transaction - a crash can no longer strand a recorded-but-uncredited tx; creditService gains creditUserTx (caller-owned transaction; creditUser delegates, zero behavior change) - leaderLease (CRON_LEADER_LEASE=true): Postgres session advisory lock on a dedicated client gates every cron tick; holder crash frees the lock so a standby master takes over on its next tick; SIGTERM releases explicitly - tests: hour-bucket determinism/UTC/collision, fee formula unchanged, flag-off no-op gate (DB-free; multi-master paths covered by Phase 1.5 e2e) Part of #46 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… replication sweep Phase 1.5 Stage A deliverables (#46): - docker/master: compose stack (postgres-pinning, pinning OpenAPI from main_postgres.go, pinning-webui with cron family) - host networking + 127.0.0.1 binds like prod, healthchecks + restart policies + label-scoped watchtower; optional fula-gateway profile (auto-enabled when image exists) - update-scripts/join-as-master.sh v0: capstone installer first cut - detect/adopt-or-halt (adopts the Phase-1 kubo+cluster writer; halts on a foreign postgres-pinning), ordered migrations with halt-on-error + marker, idempotent re-runs, params persisted to .env (phase-common pattern) - update-scripts/pinset-snapshot.sh: signed (ed25519) authoritative pinset dumps + --verify/--restore/--install-cron (early FM-3 restore path) - update-scripts/replication-sweep.sh: below-REPL_MIN detection + recover + alert log + --strict for drills (closes the S4 sweep gap) - test seams: processUserDeduction exported; cron intervals env-overridable (SCANNER_INTERVAL_MS/DEDUCTION_INTERVAL_MS, defaults unchanged) - tests: fm2-billing-integration (live-Postgres; skips cleanly without DB) - concurrent same-hour deduction races deduct once; replayed deposit credits once and leaves no recorded-but-uncredited state Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ver, snapshots, sweep D1 stack health, D2 migration-018 presence, D3 two webui masters -> one leader + one standby + exactly one hourly_deduction per (user, hour), D4 kill -9 leader -> standby acquires lease, STILL one row (idempotency under failover), D5 live-Postgres vitest integration, D6 snapshot take/verify/tamper-reject/unpin+restore, D7 sweep clean -> forced under-replication detected (--strict) + alerted -> reconverged clean. Part of #46 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…_PORT default) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… auto-generate + persist Found by the live installer run (webui FATALs without them; container crash-looped silently). join-as-master.sh now generates both once (openssl rand) and persists to .env; compose fails fast with a clear message if absent; drill webuis receive them too. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Found by the live Phase 1.5 e2e run: - migration 019: user_wallets.wallet_address DROP NOT NULL - post-PII linkWallet stores hash-only (wallet_address=NULL) so EVERY fresh install rejected wallet links; 012 relaxed user_email but missed this column (guarded .down.sql). Real fresh-deploy bug, not test-only. - compose: mount fula-gateway-state at /var/lib/fula-gateway - the gateway durable state paths are hardcoded there; without the volume the S2 pin queue silently degrades to fire-and-forget and the bucket registry resets on restart. - drills: D5 runner no longer swallows vitest exit (pipefail + explicit 2-passed check); D6 takes the FIRST cid from the snapshot stream (was capturing a multiline list) and polls 60s for the restore. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
join-as-master.sh now installs the pinset-snapshot (6h) and replication-sweep (30min) cron entries and takes the first snapshot immediately - the restore path exists from minute one, per the safeguards invariant (S4/S6 must be scheduled, not manual). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…pass count The tests passed (2/2 on live Postgres) but the colored output broke the literal match - strip escapes, then assert. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Phase 1.5 — Stage A federated master: full stack on box #2
Implements the Stage A milestone of the federated-master roadmap: a second full master (operator-run) with double-run-proof billing, the capstone installer v0, signed pinset snapshots, a replication sweep, and a fenced failover/resync runbook. Everything additive + flag-gated (default OFF — single-master behavior byte-identical when dark).
What's in
BILLING_IDEMPOTENCY,CRON_LEADER_LEASE):(user_id, reference_id) WHERE tx_type='hourly_deduction'(CONCURRENTLY, dup pre-check,.down.sql);hour:YYYY-MM-DDTHHkey and the history INSERT is the dedup gate before any balance change;creditUserTx); a crash can no longer strand a recorded-but-uncredited deposit;update-scripts/join-as-master.shv0 (capstone installer first cut): detect-installed / adopt-or-halt (adopts the Phase-1 kubo+cluster writer; halts on a foreign postgres), ordered migrations with halt-on-error, dockerized stack (healthchecks + restart policies + label-scoped watchtower), gateway profile auto-enabled when thefula-gatewayimage exists (built via feat(docker): reproducible fula-gateway image for federated masters fula-api#30), safeguard crons installed + first snapshot taken immediately, params persisted to.env, idempotent re-runs.pinset-snapshot.sh— signed (ed25519) authoritative pinset dumps,--verify/--restore/--install-cron(early FM-3 restore path).replication-sweep.sh— below-REPL_MINdetection +recover+ alert log +--strict(closes the S4 "no automated sweep" gap).linkWalletstoreswallet_address=NULLbut fresh schemas keep NOT NULL (012 missed it) ⇒ every fresh deployment rejected wallet links. Guarded.down.sql.tests/e2e/phase-1.5/).E2E evidence (clean Ubuntu 24.04 box, real daemons, real Postgres)
Drill suite
60-master-drills.sh— final run all green (RESULT pass=18 fail=0):hourly_deductionrow per (user, hour)--restorere-pinsMixed-fleet/no-forced-upgrade invariant unaffected: all changes are master-side and dark by default; providers and existing data untouched (Phase 1 drills covered the provider side).
Cross-repo
fula-gatewayimage (built + serving S3 on the box; state volume fix included)GET /mastersfederated master-list v0 (schema mirrors the future Base MasterRegistry)Scoping note
The full FxFiles-flow upload/download fidelity suite runs with Phase 2 (which changes the upload path; this phase doesn't touch it). Stage A failover is operator-fenced per the runbook until FM-1 (bucket-root CAS, Phase 2.5) enables auto-failover.
Closes #46
🤖 Generated with Claude Code