fix(mempool): close same-node nonce TOCTOU via per-sender advisory lock (batch C PR 4)#887
Conversation
…ck (batch C PR 4)
PR 4 of the batch C sequence — closes the validate→addTransaction
window flagged on PR 3's Greptile review and originally promised in
PR 2's `assignNonce` docstring.
The window
`assignNonce` runs during `confirmTransaction`, which is a
separate RPC round-trip from the execute-side
`Mempool.addTransaction` call that actually inserts. Between the
two, two concurrent ingress paths for the same sender can both
pass validation (each sees the same stale `account.nonce +
pendingCount`) and reach `addTransaction` with duplicate nonces.
PR 3's consensus-side `expectedPrior` reject catches them at
block-apply time, but only after both occupy mempool slots and
burn cleanup churn.
The fix
Wrap the per-sender re-check + insert inside
`Mempool.addTransaction` in a Postgres transaction. Acquire a
`pg_advisory_xact_lock(hashtext('nonce:' || sender))` at the top
of the transaction. The lock serialises concurrent same-sender
admissions on this node; the in-lock re-query of `account.nonce
+ countPendingByAddress` catches the second duplicate-nonce
submission before it inserts. Released at commit/rollback.
Lock key: `hashtext('nonce:' || sender_lowercase_hex)`. Prefixing
the address with `nonce:` namespaces the keyspace so future
advisory-lock callers in the project (none today) cannot collide.
Fork-gated
Only runs when `nonceEnforcement` is active. Pre-fork chains
bit-identical (legacy `addTransaction` path preserved unchanged
below the gate). Genesis-tx / awardPoints / L2PS internal paths
that don't carry a native `tx.content.from` string skip the gate
via the `senderFrom` and `typeof txNonce === "number"` guards.
Cross-node note
Mempool-side advisory locks are local to this node's Postgres.
Cross-RPC replay across two different validators is NOT covered
by this lock — that remains the consensus-side `expectedPrior`
apply-time reject in `GCRNonceRoutines` (PR 3).
Docstring cleanup
`assignNonce` JSDoc rewritten to point at `Mempool.addTransaction`
as the actual TOCTOU mitigation site. The original wording
promised PR 3 would land the lock; PR 3 shipped without it and
Greptile flagged the discrepancy. Updated to match shipped reality.
Doc updates
- PRs frontmatter: PR 3 marked merged, PR 4 added.
- Risks table TOCTOU row: rewritten to describe the actual PR 4
fix shape (re-check + insert under advisory lock) and the
cross-node carve-out.
- PR breakdown: new PR 4 row.
Verification
- tsc --noEmit clean on both changed code files.
- Manual trace of concurrent same-sender submission:
1. Client A and B both validate with account.nonce=0. Both pass
(tx.content.nonce=1 in both).
2. Both reach Mempool.addTransaction concurrently. A acquires
advisory lock first.
3. A re-queries inside lock: account.nonce=0, pending=0,
expected=1. tx.content.nonce=1. Matches. Inserts. Releases.
4. B acquires lock. Re-queries: account.nonce=0, pending=1
(A just inserted), expected=2. tx.content.nonce=1. Mismatch.
Returns error without inserting.
- Pre-fork trace: senderFrom set, txNonce number, but
`isForkActive` returns false → skip the locked branch → legacy
insert path runs → bit-identical re-sync.
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
|
Warning Review limit reached
More reviews will be available in 20 minutes and 33 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryThis PR closes the same-node TOCTOU window in
Confidence Score: 5/5Safe to merge — the fork gate ensures pre-fork chains are untouched, and the advisory-lock + READ COMMITTED re-check correctly serialises concurrent same-sender inserts on this node. All three issues flagged in previous review rounds are addressed in this iteration. The advisory-lock + READ COMMITTED design correctly serialises concurrent same-sender inserts. The fork gate ensures pre-fork chains are bit-identical. The only residual note is the hashtext() 32-bit collision space, a known throughput trade-off with no correctness impact. No files require special attention. The advisory-lock path in mempool.ts is the only non-trivial change and its invariants are well-documented. Important Files Changed
Sequence DiagramsequenceDiagram
participant ClientA as Client A
participant ClientB as Client B
participant Mempool as Mempool.addTransaction
participant PG as PostgreSQL
par Concurrent calls
ClientA->>Mempool: "addTransaction(tx_A, nonce=1)"
ClientB->>Mempool: "addTransaction(tx_B, nonce=1)"
end
Mempool->>PG: BEGIN READ COMMITTED
Mempool->>PG: pg_advisory_xact_lock('nonce:sender') - Client A wins
Note over ClientB,PG: Client B blocks
Mempool->>PG: "hashExists? No. account.nonce=0, pendingCount=0"
Note right of Mempool: expected=1 == nonce=1 ok
Mempool->>PG: INSERT tx_A + COMMIT
Note over ClientB,PG: Client B acquires lock
Mempool->>PG: BEGIN READ COMMITTED
Mempool->>PG: pg_advisory_xact_lock('nonce:sender') - Client B
Mempool->>PG: "account.nonce=0, pendingCount=1 (sees tx_A)"
Note right of Mempool: expected=2 != nonce=1 fail
Mempool->>PG: ROLLBACK
Mempool-->>ClientB: Nonce TOCTOU recheck failed
Reviews (3): Last reviewed commit: "fix(mempool): live block-height query in..." | Re-trigger Greptile |
…loop iter 1)
Two Greptile findings, both real.
P1 — isolation level silently breaks TOCTOU under non-default config
`this.repo.manager.transaction()` opens a Postgres transaction at
the server's default isolation level. Under `READ COMMITTED` (the
Postgres default) every statement inside the transaction sees the
latest committed data, so the in-lock re-query of `pendingCount`
correctly reflects any concurrent winner. Under `REPEATABLE READ`
or `SERIALIZABLE`, the transaction snapshot is taken at `BEGIN`
— BEFORE the advisory lock fires — so both racing threads see
the same stale `pendingCount`, both compute the same expected
nonce, both pass the recheck, both insert. The TOCTOU fix
silently collapses.
A DBA or pooler change to `REPEATABLE READ` (not uncommon in
read-heavy deployments) would defeat this PR without any
observable failure mode. Pin the isolation explicitly so the
lock + recheck semantics are independent of operator config.
Fix: `this.repo.manager.transaction("READ COMMITTED", async em
=> { ... })`. TypeORM accepts an optional isolation level as
the first argument.
P2 — duplicate-hash concurrent submit produces misleading error
The outer `Chain.checkTxExists` + `checkTransactionByHash` calls
run BEFORE the advisory lock. Two concurrent submissions of the
IDENTICAL transaction (same hash, same sender, same nonce) both
pass those checks. The first inserts. The second then sees
`pendingCount=1`, computes `expected=N+2`, and is rejected —
but the error string is `"Nonce TOCTOU recheck failed:
tx.content.nonce=N+1, expected=N+2"` which misleads the operator
into thinking the client sent a wrong nonce when the real cause
is duplicate broadcast.
Fix: add `mempoolRepo.exists({ where: { hash: transaction.hash
} })` as the FIRST step inside the lock. If true, return
`"Transaction already in mempool"` — the same error the outer
check returns — so observability stays consistent.
Verification
- tsc --noEmit clean.
- Manual trace under `REPEATABLE READ` after the pin:
1. Both threads call `transaction("READ COMMITTED", ...)` so
the txn opens at READ COMMITTED regardless of server
default.
2. A acquires advisory lock, B blocks.
3. A re-queries pending=0, expected=1, inserts, commits.
4. B acquires lock. Inside-lock hash recheck: A's row visible
(READ COMMITTED sees committed rows). Returns
"Transaction already in mempool" if hash matches, or
re-queries pending=1 → expected=2 → mismatch if hash
differs but nonce was duplicated.
- Pre-fork path unchanged.
…h assignNonce (PR #887 greploop iter 2) Greptile P1 — stale `blockHeight` produces cutoff mismatch `blockHeight` is captured from `getSharedState.lastBlockNumber` at the top of `addTransaction`, BEFORE any await points. The in-lock `cutoff = blockHeight - referenceBlockRoom` uses that snapshot. Meanwhile `assignNonce` (during the earlier `confirmTransaction` RPC) derives its cutoff via `Mempool.countPendingByAddress`, which calls `Chain.getLastBlockNumber()` — a live DB query. If a block is produced between the two RPCs, the two cutoffs diverge by 1. A pending mempool row whose `reference_block` equals the old cutoff is counted by `assignNonce` but excluded by the in-lock recheck (or vice versa). The recheck then rejects a still-valid tx for a phantom nonce mismatch, with no recovery path on the client side except retry. Fix: re-query `Chain.getLastBlockNumber()` inside the lock and derive the cutoff from it. Both paths now align on the same live block tip and the cutoff is identical to the one the earlier validation step used. Notes: - Adds one more DB round-trip inside the locked critical section. Acceptable: `Chain.getLastBlockNumber()` is a primary-key lookup on `chain` and runs in <1ms; the advisory lock holds for ~5-10ms in the steady state. The correctness gain dwarfs the latency cost. - The outer `blockHeight` snapshot is still used for the fork gate (`isForkActive("nonceEnforcement", blockHeight)`) which is harmless — fork-gating doesn't need apply-time precision, only "are we somewhere past activation height". Verification: - tsc --noEmit clean. - Manual trace of the block-transition race: 1. Client A submits confirmTx at block N. Validation uses cutoff = N - referenceBlockRoom. 2. Block N+1 lands. 3. Client A submits executeTx. `blockHeight` snapshot is N+1 (lastBlockNumber updated). Without the fix, in-lock cutoff is (N+1) - referenceBlockRoom, ONE HIGHER than what assignNonce used. Pending row at reference_block = cutoff_assignNonce is excluded by the in-lock count → expected nonce off by 1 → false rejection. 4. With the fix, in-lock re-queries Chain.getLastBlockNumber() which returns the same N+1 (or higher if more blocks land during the lock wait) and derives the cutoff from that. The compare is consistent and the tx admits if and only if the nonce is still valid against the current chain state.
…ork testing pass Single document covering everything surfaced during the testing pass that ran 2026-05-26 → 2026-05-31 against dev.node2. Organised by topic, not chronology, so the team has one place to look up what was found, what was fixed, and what is still pending — without scrolling Telegram or hopping between PRs. Sections: - L2PS encryption nonce reuse (SDK #87 + 3 follow-up security findings) - Governance hash mismatch on dev.node2 (still broken; real fix in DEM-727) - Test-coverage gap that allowed the governance break to slip through - Side finding — nonces were not checked or incremented per tx (Hovhannes's batch C #884–#887, batch D #888 in flight) - UX surfacing for L2PS (Demo #11) - Operational risks still open - Linear ticket map across DEM-722 epic - Source material cross-links Cross-links the raw battery output and serializer analysis already in flight via Node #876. Closes DEM-729. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Summary
PR 4 of the batch C nonce-replay-protection sequence specified in
docs/specs/audit-sweep-batch-c-nonce.md. Closes the same-node TOCTOU window that was flagged on PR #886 and originally promised in PR #885'sassignNoncedocstring (but never landed).3 files, +133 / -10.
The window
assignNonceruns duringconfirmTransaction(validate RPC).Mempool.addTransactionruns during a separate execute RPC. Between the two, two concurrent ingress paths for the same sender can both pass validation (each sees the same staleaccount.nonce + pendingCount) and reachaddTransactionwith duplicate nonces.PR #886's consensus-side
expectedPriorreject catches them at block-apply time, but only after both occupy mempool slots and burn cleanup churn.What this PR ships
Mempool.addTransactionWrap the per-sender re-check + insert in a Postgres transaction:
Lock key namespaced as
nonce:${sender_lowercase_hex}so future advisory-lock callers in the project can't collide. Released at commit/rollback.Fork-gated
Only runs when
nonceEnforcementis active and the tx carries a stringcontent.fromand a numericcontent.nonce. Pre-fork chains stay bit-identical via the legacy insert path below the gate. Genesis-tx / awardPoints / L2PS internal paths that don't carry a native sender skip the gate.assignNonceJSDocRewritten to point at
Mempool.addTransactionas the actual TOCTOU mitigation site. PR 2's original wording promised PR 3 would land the lock; PR 3 shipped without it and Greptile flagged the discrepancy. Now matches shipped reality.Spec doc
Cross-node note
This lock is local to this node's Postgres. Cross-RPC replay across two different validators is NOT covered. That remains the consensus-side
expectedPriorapply-time reject inGCRNonceRoutines(PR #886). PR 4 is the same-node companion to that cross-node safety net.Verification
tsc --noEmitclean on both changed code files.account.nonce=0. Both pass (tx.content.nonce=1in both).Mempool.addTransactionconcurrently. A acquires advisory lock first.account.nonce=0,pending=0,expected=1. Match. Inserts. Releases at commit.account.nonce=0,pending=1(A just inserted),expected=2.tx.content.nonce=1. Mismatch. Returns error without inserting.senderFromset,txNoncenumber, butisForkActivereturns false → skip the locked branch → legacy insert path runs → bit-identical re-sync.Test plan
bun installand confirm clean tsc.confirmTx + executeTxround-trips back-to-back with identicaltx.content.nonce. Expect first to land in mempool, second to fail withNonce TOCTOU recheck failederror.nonceEnforcement.activationHeight: null): confirm legacyaddTransactionpath is taken (no advisory lock, no recheck) — re-sync stays bit-identical.