Skip to content

fix(mempool): close same-node nonce TOCTOU via per-sender advisory lock (batch C PR 4)#887

Merged
tcsenpai merged 3 commits into
stabilisationfrom
bugfix/audit-sweep-batch-c-toctou-lock-2026-05-31
May 31, 2026
Merged

fix(mempool): close same-node nonce TOCTOU via per-sender advisory lock (batch C PR 4)#887
tcsenpai merged 3 commits into
stabilisationfrom
bugfix/audit-sweep-batch-c-toctou-lock-2026-05-31

Conversation

@tcsenpai
Copy link
Copy Markdown
Contributor

Summary

PR 4 of the batch C nonce-replay-protection sequence specified in docs/specs/audit-sweep-batch-c-nonce.md. Closes the same-node TOCTOU window that was flagged on PR #886 and originally promised in PR #885's assignNonce docstring (but never landed).

3 files, +133 / -10.

The window

assignNonce runs during confirmTransaction (validate RPC). Mempool.addTransaction runs during a separate execute RPC. Between the two, two concurrent ingress paths for the same sender can both pass validation (each sees the same stale account.nonce + pendingCount) and reach addTransaction with duplicate nonces.

PR #886's consensus-side expectedPrior reject catches them at block-apply time, but only after both occupy mempool slots and burn cleanup churn.

What this PR ships

Mempool.addTransaction

Wrap the per-sender re-check + insert in a Postgres transaction:

await em.query(
    "SELECT pg_advisory_xact_lock(hashtext($1))",
    [`nonce:${senderFrom}`],
)
// re-query account.nonce + countPendingByAddress INSIDE the lock
// re-check tx.content.nonce === expected
// insert (or reject)

Lock key namespaced as nonce:${sender_lowercase_hex} so future advisory-lock callers in the project can't collide. Released at commit/rollback.

Fork-gated

Only runs when nonceEnforcement is active and the tx carries a string content.from and a numeric content.nonce. Pre-fork chains stay bit-identical via the legacy insert path below the gate. Genesis-tx / awardPoints / L2PS internal paths that don't carry a native sender skip the gate.

assignNonce JSDoc

Rewritten to point at Mempool.addTransaction as the actual TOCTOU mitigation site. PR 2's original wording promised PR 3 would land the lock; PR 3 shipped without it and Greptile flagged the discrepancy. Now matches shipped reality.

Spec doc

  • PRs frontmatter: PR 3 marked merged, PR 4 added.
  • Risks table TOCTOU row rewritten to describe the actual PR 4 fix shape + cross-node carve-out.
  • PR breakdown: new PR 4 row (Low risk, fork-gated, single function).

Cross-node note

This lock is local to this node's Postgres. Cross-RPC replay across two different validators is NOT covered. That remains the consensus-side expectedPrior apply-time reject in GCRNonceRoutines (PR #886). PR 4 is the same-node companion to that cross-node safety net.

Verification

  • tsc --noEmit clean on both changed code files.
  • Manual trace of concurrent same-sender submission:
    1. Client A and B both validate with account.nonce=0. Both pass (tx.content.nonce=1 in both).
    2. Both reach Mempool.addTransaction concurrently. A acquires advisory lock first.
    3. A re-queries inside lock: account.nonce=0, pending=0, expected=1. Match. Inserts. Releases at commit.
    4. B acquires lock. Re-queries: account.nonce=0, pending=1 (A just inserted), expected=2. tx.content.nonce=1. Mismatch. Returns error without inserting.
  • Pre-fork trace: senderFrom set, txNonce number, but isForkActive returns false → skip the locked branch → legacy insert path runs → bit-identical re-sync.

Test plan

  • bun install and confirm clean tsc.
  • Boot devnet (post-fork from block 0).
  • Concurrent-submission test: fire 2 same-sender confirmTx + executeTx round-trips back-to-back with identical tx.content.nonce. Expect first to land in mempool, second to fail with Nonce TOCTOU recheck failed error.
  • Pre-fork chain (testnet checkout with nonceEnforcement.activationHeight: null): confirm legacy addTransaction path is taken (no advisory lock, no recheck) — re-sync stays bit-identical.

…ck (batch C PR 4)

PR 4 of the batch C sequence — closes the validate→addTransaction
window flagged on PR 3's Greptile review and originally promised in
PR 2's `assignNonce` docstring.

The window
  `assignNonce` runs during `confirmTransaction`, which is a
  separate RPC round-trip from the execute-side
  `Mempool.addTransaction` call that actually inserts. Between the
  two, two concurrent ingress paths for the same sender can both
  pass validation (each sees the same stale `account.nonce +
  pendingCount`) and reach `addTransaction` with duplicate nonces.
  PR 3's consensus-side `expectedPrior` reject catches them at
  block-apply time, but only after both occupy mempool slots and
  burn cleanup churn.

The fix
  Wrap the per-sender re-check + insert inside
  `Mempool.addTransaction` in a Postgres transaction. Acquire a
  `pg_advisory_xact_lock(hashtext('nonce:' || sender))` at the top
  of the transaction. The lock serialises concurrent same-sender
  admissions on this node; the in-lock re-query of `account.nonce
  + countPendingByAddress` catches the second duplicate-nonce
  submission before it inserts. Released at commit/rollback.

  Lock key: `hashtext('nonce:' || sender_lowercase_hex)`. Prefixing
  the address with `nonce:` namespaces the keyspace so future
  advisory-lock callers in the project (none today) cannot collide.

Fork-gated
  Only runs when `nonceEnforcement` is active. Pre-fork chains
  bit-identical (legacy `addTransaction` path preserved unchanged
  below the gate). Genesis-tx / awardPoints / L2PS internal paths
  that don't carry a native `tx.content.from` string skip the gate
  via the `senderFrom` and `typeof txNonce === "number"` guards.

Cross-node note
  Mempool-side advisory locks are local to this node's Postgres.
  Cross-RPC replay across two different validators is NOT covered
  by this lock — that remains the consensus-side `expectedPrior`
  apply-time reject in `GCRNonceRoutines` (PR 3).

Docstring cleanup
  `assignNonce` JSDoc rewritten to point at `Mempool.addTransaction`
  as the actual TOCTOU mitigation site. The original wording
  promised PR 3 would land the lock; PR 3 shipped without it and
  Greptile flagged the discrepancy. Updated to match shipped reality.

Doc updates
  - PRs frontmatter: PR 3 marked merged, PR 4 added.
  - Risks table TOCTOU row: rewritten to describe the actual PR 4
    fix shape (re-check + insert under advisory lock) and the
    cross-node carve-out.
  - PR breakdown: new PR 4 row.

Verification
  - tsc --noEmit clean on both changed code files.
  - Manual trace of concurrent same-sender submission:
    1. Client A and B both validate with account.nonce=0. Both pass
       (tx.content.nonce=1 in both).
    2. Both reach Mempool.addTransaction concurrently. A acquires
       advisory lock first.
    3. A re-queries inside lock: account.nonce=0, pending=0,
       expected=1. tx.content.nonce=1. Matches. Inserts. Releases.
    4. B acquires lock. Re-queries: account.nonce=0, pending=1
       (A just inserted), expected=2. tx.content.nonce=1. Mismatch.
       Returns error without inserting.
  - Pre-fork trace: senderFrom set, txNonce number, but
    `isForkActive` returns false → skip the locked branch → legacy
    insert path runs → bit-identical re-sync.
@qodo-code-review
Copy link
Copy Markdown
Contributor

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 31, 2026

Warning

Review limit reached

@tcsenpai, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 20 minutes and 33 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 99481827-db34-44cd-8367-ff7ad9f7b947

📥 Commits

Reviewing files that changed from the base of the PR and between 47be8cd and 2a0e79c.

📒 Files selected for processing (3)
  • docs/specs/audit-sweep-batch-c-nonce.md
  • src/libs/blockchain/mempool.ts
  • src/libs/blockchain/routines/validateTransaction.ts
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch bugfix/audit-sweep-batch-c-toctou-lock-2026-05-31

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 31, 2026

Greptile Summary

This PR closes the same-node TOCTOU window in Mempool.addTransaction by wrapping the per-sender re-check and insert in a Postgres transaction at READ COMMITTED isolation protected by pg_advisory_xact_lock(hashtext('nonce:' + sender)). All three prior review concerns (isolation level, stale block-height cutoff, and misleading duplicate-hash error) are addressed directly in this iteration.

  • mempool.ts: Adds a fork-gated locked path that acquires a per-sender advisory lock, re-queries both the account nonce and live mempool count inside the lock, checks for duplicate hash before the nonce comparison, and only then inserts — falling through to the legacy path when nonceEnforcement is inactive or the tx lacks a native sender/nonce.
  • validateTransaction.ts: JSDoc-only update to assignNonce pointing at Mempool.addTransaction as the actual TOCTOU mitigation site.
  • docs/specs/audit-sweep-batch-c-nonce.md: Spec updated with PR 4 row and corrected TOCTOU risk description.

Confidence Score: 5/5

Safe to merge — the fork gate ensures pre-fork chains are untouched, and the advisory-lock + READ COMMITTED re-check correctly serialises concurrent same-sender inserts on this node.

All three issues flagged in previous review rounds are addressed in this iteration. The advisory-lock + READ COMMITTED design correctly serialises concurrent same-sender inserts. The fork gate ensures pre-fork chains are bit-identical. The only residual note is the hashtext() 32-bit collision space, a known throughput trade-off with no correctness impact.

No files require special attention. The advisory-lock path in mempool.ts is the only non-trivial change and its invariants are well-documented.

Important Files Changed

Filename Overview
src/libs/blockchain/mempool.ts Core change: wraps the per-sender re-check + insert in a Postgres transaction with pg_advisory_xact_lock at READ COMMITTED isolation. Previous review issues (isolation level, stale blockHeight, duplicate-hash misleading error) are all addressed in this version.
src/libs/blockchain/routines/validateTransaction.ts JSDoc-only update to assignNonce concurrency caveat: rewrites the stale PR-3 promise to accurately describe PR 4's advisory-lock location. No logic changes.
docs/specs/audit-sweep-batch-c-nonce.md Spec frontmatter updated (PR 3 marked merged, PR 4 added). TOCTOU risks-table row rewritten to describe the actual PR 4 advisory-lock shape. No code impact.

Sequence Diagram

sequenceDiagram
    participant ClientA as Client A
    participant ClientB as Client B
    participant Mempool as Mempool.addTransaction
    participant PG as PostgreSQL

    par Concurrent calls
        ClientA->>Mempool: "addTransaction(tx_A, nonce=1)"
        ClientB->>Mempool: "addTransaction(tx_B, nonce=1)"
    end

    Mempool->>PG: BEGIN READ COMMITTED
    Mempool->>PG: pg_advisory_xact_lock('nonce:sender') - Client A wins
    Note over ClientB,PG: Client B blocks
    Mempool->>PG: "hashExists? No. account.nonce=0, pendingCount=0"
    Note right of Mempool: expected=1 == nonce=1 ok
    Mempool->>PG: INSERT tx_A + COMMIT
    Note over ClientB,PG: Client B acquires lock
    Mempool->>PG: BEGIN READ COMMITTED
    Mempool->>PG: pg_advisory_xact_lock('nonce:sender') - Client B
    Mempool->>PG: "account.nonce=0, pendingCount=1 (sees tx_A)"
    Note right of Mempool: expected=2 != nonce=1 fail
    Mempool->>PG: ROLLBACK
    Mempool-->>ClientB: Nonce TOCTOU recheck failed
Loading

Reviews (3): Last reviewed commit: "fix(mempool): live block-height query in..." | Re-trigger Greptile

Comment thread src/libs/blockchain/mempool.ts Outdated
Comment thread src/libs/blockchain/mempool.ts Outdated
…loop iter 1)

Two Greptile findings, both real.

P1 — isolation level silently breaks TOCTOU under non-default config
  `this.repo.manager.transaction()` opens a Postgres transaction at
  the server's default isolation level. Under `READ COMMITTED` (the
  Postgres default) every statement inside the transaction sees the
  latest committed data, so the in-lock re-query of `pendingCount`
  correctly reflects any concurrent winner. Under `REPEATABLE READ`
  or `SERIALIZABLE`, the transaction snapshot is taken at `BEGIN`
  — BEFORE the advisory lock fires — so both racing threads see
  the same stale `pendingCount`, both compute the same expected
  nonce, both pass the recheck, both insert. The TOCTOU fix
  silently collapses.

  A DBA or pooler change to `REPEATABLE READ` (not uncommon in
  read-heavy deployments) would defeat this PR without any
  observable failure mode. Pin the isolation explicitly so the
  lock + recheck semantics are independent of operator config.

  Fix: `this.repo.manager.transaction("READ COMMITTED", async em
  => { ... })`. TypeORM accepts an optional isolation level as
  the first argument.

P2 — duplicate-hash concurrent submit produces misleading error
  The outer `Chain.checkTxExists` + `checkTransactionByHash` calls
  run BEFORE the advisory lock. Two concurrent submissions of the
  IDENTICAL transaction (same hash, same sender, same nonce) both
  pass those checks. The first inserts. The second then sees
  `pendingCount=1`, computes `expected=N+2`, and is rejected —
  but the error string is `"Nonce TOCTOU recheck failed:
  tx.content.nonce=N+1, expected=N+2"` which misleads the operator
  into thinking the client sent a wrong nonce when the real cause
  is duplicate broadcast.

  Fix: add `mempoolRepo.exists({ where: { hash: transaction.hash
  } })` as the FIRST step inside the lock. If true, return
  `"Transaction already in mempool"` — the same error the outer
  check returns — so observability stays consistent.

Verification
  - tsc --noEmit clean.
  - Manual trace under `REPEATABLE READ` after the pin:
    1. Both threads call `transaction("READ COMMITTED", ...)` so
       the txn opens at READ COMMITTED regardless of server
       default.
    2. A acquires advisory lock, B blocks.
    3. A re-queries pending=0, expected=1, inserts, commits.
    4. B acquires lock. Inside-lock hash recheck: A's row visible
       (READ COMMITTED sees committed rows). Returns
       "Transaction already in mempool" if hash matches, or
       re-queries pending=1 → expected=2 → mismatch if hash
       differs but nonce was duplicated.
  - Pre-fork path unchanged.
Comment thread src/libs/blockchain/mempool.ts Outdated
…h assignNonce (PR #887 greploop iter 2)

Greptile P1 — stale `blockHeight` produces cutoff mismatch
  `blockHeight` is captured from `getSharedState.lastBlockNumber`
  at the top of `addTransaction`, BEFORE any await points. The
  in-lock `cutoff = blockHeight - referenceBlockRoom` uses that
  snapshot.

  Meanwhile `assignNonce` (during the earlier `confirmTransaction`
  RPC) derives its cutoff via `Mempool.countPendingByAddress`,
  which calls `Chain.getLastBlockNumber()` — a live DB query.

  If a block is produced between the two RPCs, the two cutoffs
  diverge by 1. A pending mempool row whose `reference_block`
  equals the old cutoff is counted by `assignNonce` but excluded
  by the in-lock recheck (or vice versa). The recheck then
  rejects a still-valid tx for a phantom nonce mismatch, with no
  recovery path on the client side except retry.

Fix: re-query `Chain.getLastBlockNumber()` inside the lock and
derive the cutoff from it. Both paths now align on the same
live block tip and the cutoff is identical to the one the
earlier validation step used.

Notes:
  - Adds one more DB round-trip inside the locked critical
    section. Acceptable: `Chain.getLastBlockNumber()` is a
    primary-key lookup on `chain` and runs in <1ms; the
    advisory lock holds for ~5-10ms in the steady state. The
    correctness gain dwarfs the latency cost.
  - The outer `blockHeight` snapshot is still used for the fork
    gate (`isForkActive("nonceEnforcement", blockHeight)`) which
    is harmless — fork-gating doesn't need apply-time precision,
    only "are we somewhere past activation height".

Verification:
  - tsc --noEmit clean.
  - Manual trace of the block-transition race:
    1. Client A submits confirmTx at block N. Validation uses
       cutoff = N - referenceBlockRoom.
    2. Block N+1 lands.
    3. Client A submits executeTx. `blockHeight` snapshot is N+1
       (lastBlockNumber updated). Without the fix, in-lock
       cutoff is (N+1) - referenceBlockRoom, ONE HIGHER than
       what assignNonce used. Pending row at reference_block =
       cutoff_assignNonce is excluded by the in-lock count →
       expected nonce off by 1 → false rejection.
    4. With the fix, in-lock re-queries Chain.getLastBlockNumber()
       which returns the same N+1 (or higher if more blocks land
       during the lock wait) and derives the cutoff from that.
       The compare is consistent and the tx admits if and only
       if the nonce is still valid against the current chain
       state.
@tcsenpai tcsenpai merged commit 924df40 into stabilisation May 31, 2026
4 checks passed
@Shitikyan Shitikyan mentioned this pull request Jun 1, 2026
Shitikyan added a commit that referenced this pull request Jun 1, 2026
…ork testing pass

Single document covering everything surfaced during the testing pass that
ran 2026-05-26 → 2026-05-31 against dev.node2. Organised by topic, not
chronology, so the team has one place to look up what was found, what
was fixed, and what is still pending — without scrolling Telegram or
hopping between PRs.

Sections:
- L2PS encryption nonce reuse (SDK #87 + 3 follow-up security findings)
- Governance hash mismatch on dev.node2 (still broken; real fix in DEM-727)
- Test-coverage gap that allowed the governance break to slip through
- Side finding — nonces were not checked or incremented per tx (Hovhannes's batch C #884#887, batch D #888 in flight)
- UX surfacing for L2PS (Demo #11)
- Operational risks still open
- Linear ticket map across DEM-722 epic
- Source material cross-links

Cross-links the raw battery output and serializer analysis already in
flight via Node #876. Closes DEM-729.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant