Skip to content

fix(evmigration): mempool-accept zero-signer migration txs#167

Merged
mateeullahmalik merged 5 commits into
masterfrom
matee/evmigration-mempool-signer
Jun 19, 2026
Merged

fix(evmigration): mempool-accept zero-signer migration txs#167
mateeullahmalik merged 5 commits into
masterfrom
matee/evmigration-mempool-signer

Conversation

@mateeullahmalik

Copy link
Copy Markdown
Contributor

fix(evmigration): mempool-accept zero-signer migration txs

TL;DR

submit-proof broadcasts a zero-signer MsgClaimLegacyAccount / MsgMigrateValidator. On v1.20.0-rc4, every such broadcast was rejected by the EVM mempool with:

tx must have at least one signer

Root cause: ExperimentalEVMMempool.cosmosPool uses the SDK's DefaultSignerExtractionAdapter, which calls tx.GetSignaturesV2() and rejects empty sig sets in PriorityNonceMempool.Insertbefore the migration-aware ante chain (app/evm/ante.go: migrationCosmosAnte) ever runs.

This PR teaches the mempool to recognize migration-only txs and synthesize a deterministic signer from legacy_address, so the ante chain gets a chance to run as designed.

Discovered during the v1.20.0-rc4 multisig migration rehearsal in #163 (migrate-batch.sh end-to-end). Reproduces with migrate-multisig.sh submit, migrate-account.sh, and a hand-built lumerad tx broadcast.

What's in the diff

File Change
app/evmigration_signer_extraction_adapter.go NEW. evmigrationSignerExtractionAdapter — synthesizes a SignerData{AccAddress(legacy_address), 0} for IsEVMigrationOnlyTx; delegates everything else to a configurable fallback.
app/evm_mempool.go Builds CosmosPoolConfig explicitly with our adapter; wraps the PrepareProposal signer extractor as EthSignerExtractionAdapter(migrationAdapter(default)). Replicates upstream's default priority function locally with a zero-fee short-circuit so the adapter override is the only behavior change.
app/evmigration_signer_extraction_adapter_test.go NEW. 7 unit tests pinning synthetic-signer derivation, fallback delegation for non-migration / mixed txs, empty-bech32 / invalid-bech32 rejection, nil-fallback safety.
app/evm_mempool_evmigration_test.go NEW. 2 integration tests on the real App + real ExperimentalEVMMempool. Asserts (a) Insert accepts a zero-signer MsgClaimLegacyAccount and CountTx() increments, and (b) the SDK default adapter still returns zero signers for the same tx (regression pin).
docs/evm-integration/user-guides/migration.md Zero-signer submit callout now points at the adapter file.
CHANGELOG.md Entry under v1.20.0.

Why it's safe

  • No behavior change for non-migration txs. The adapter explicitly delegates to the SDK default (and, on the proposal path, to the EVM-aware adapter). Mixed-message txs are not treated as migration-only — they hit the default path.
  • Synthetic signer is deterministic. legacy_address is the canonical on-chain identity of the source account and is the same field the keeper uses to keymigrate state. Using AccAddress(legacy_address) for nonce-mempool ordering preserves (sender, nonce) ordering and dedupe semantics for the natural unit of work (one migration per legacy address).
  • Sequence = 0 is fine because migration is a one-shot operation: the migration-record check in the keeper is the replay guard, and the mempool only needs (sender, sequence) to be stable within a block window.
  • Priority replication mirrors upstream. defaultCosmosPoolConfig mirrors cosmos/evm@v0.6.0 mempool.go:152 exactly; only SignerExtractor differs. The zero-fee short-circuit is a defensive addition that lets migration txs avoid an unnecessary EVMKeeper.GetEvmCoinInfo read (which the keeper currently panics on for fresh sdk.Context{} lacking a KVStore — observed in the integration test path).

Tests

go test -tags=test ./app/... -count=1
…
ok  github.com/LumeraProtocol/lumera/app  14.780s
ok  github.com/LumeraProtocol/lumera/app/evm  0.613s
ok  github.com/LumeraProtocol/lumera/app/openrpc  0.166s
ok  github.com/LumeraProtocol/lumera/app/upgrades  0.179s
ok  github.com/LumeraProtocol/lumera/app/upgrades/v1_11_0  0.158s
ok  github.com/LumeraProtocol/lumera/app/upgrades/v1_11_1  0.135s
ok  github.com/LumeraProtocol/lumera/app/upgrades/v1_12_0  0.150s
ok  github.com/LumeraProtocol/lumera/app/upgrades/v1_20_0  0.842s

The new mempool integration test in particular fails on master (with the exact tx must have at least one signer rejection) and passes on this branch — it is a literal regression pin.

Field findings (advisory for Alexey's manual MD)

Surfaced while running the script-side rehearsal against the same procedure. These are in Alexey's standalone "Multisig migration Lumera EVM.md", not in this repo's docs (the repo's migration.md and migration-scripts.md already document both correctly):

  1. migrate-multisig.sh sign must pass both --from AND --new-key for every co-signer who holds both halves of a key pair. The "--new-key-only" pattern in Alexey's MD produces one-sided partials that fail quorum at combine with "Matching-index threshold satisfied: no (0 < 2)". The repo docs already state this clearly (migration-scripts.md lines 586–608).
  2. lumerad keys add --multisig must pass --nosort when reconstructing the new EVM multisig. The default address-sort permutes the canonical pubkey order for eth_secp256k1 keys and breaks signer_indices alignment between legacy and new halves. The repo docs already require this (migration.md line 680, migration-scripts.md line 918). It affects 23 of the 28 foundation multisigs on mainnet — they'd silently produce wrong reconstructed addresses without --nosort.

Follow-up

The cosmos/evm ExperimentalEVMMempool routes non-EVM txs through a
PriorityNonceMempool that, by default, uses DefaultSignerExtractionAdapter.
That adapter calls tx.GetSignaturesV2() and refuses any tx with an empty
signature set, returning 'tx must have at least one signer' from
PriorityNonceMempool.Insert.

MsgClaimLegacyAccount and MsgMigrateValidator are zero-signer by design:
authorization lives in the proof bytes, fees are waived by
EVMigrationFeeDecorator, and the migration-aware ante chain
(app/evm/ante.go: migrationCosmosAnte) accepts the shape. That ante
chain, however, runs AFTER mempool insert. Without a migration-aware
signer extractor, every submit-proof broadcast is rejected at the
mempool layer before ante ever sees it -- including the canonical
combined-tx.json shape produced by the offline multisig flow.

This change:

* Adds app/evmigration_signer_extraction_adapter.go: a
  SignerExtractionAdapter that returns a synthetic SignerData built
  from the message's legacy_address for IsEVMigrationOnlyTx, and
  delegates everything else to a fallback (default for the Cosmos
  pool, EVM-aware for proposal building).
* Wires the adapter into ExperimentalEVMMempool.CosmosPoolConfig and
  into NewDefaultProposalHandler's signer extraction adapter so it
  applies on both Insert and PrepareProposal paths.
* Replicates upstream's default PriorityNonceMempoolConfig (priority
  by gas-price) locally so the adapter override is the only behavior
  change. Short-circuits priority calc for zero-fee/zero-gas txs so
  it doesn't touch EVM keeper state for migration txs.

Tests:

* app/evmigration_signer_extraction_adapter_test.go: 7 unit tests
  pinning synthetic-signer derivation, fallback delegation for
  non-migration and mixed txs, empty/invalid legacy_address rejection,
  and nil-fallback safety.
* app/evm_mempool_evmigration_test.go: 2 integration tests on the
  real App + real ExperimentalEVMMempool. One asserts Insert accepts
  a zero-signer MsgClaimLegacyAccount and CountTx() increments. The
  other pins the regression: the SDK default adapter still returns
  zero signers for the same tx, which is precisely what makes
  PriorityNonceMempool reject without this fix.

Docs:

* CHANGELOG.md entry under v1.20.0 explaining the fix.
* docs/evm-integration/user-guides/migration.md zero-signer-submit
  callout updated to point at the adapter file.

Discovered during v1.20.0-rc4 multisig migration rehearsal (PR #163
migrate-batch.sh end-to-end). Reproduces with migrate-multisig.sh
submit, migrate-account.sh, and hand-built lumerad tx broadcast.

Verified: go test -tags=test ./app/... green, app package tests pass
(14.8s).
@mateeullahmalik

Copy link
Copy Markdown
Contributor Author

⚠️ Hold for re-verification

Demoting to draft while I close a self-contradiction in the evidence:

  • Integration test on master reproduces the symptom: App.GetMempool().Insert(...) of a zero-signer migration tx returns tx must have at least one signer. ✅
  • Live broadcast on the same master binary (v1.20.0-rc4-bba07f1d) of a zero-signer MsgClaimLegacyAccount against lumera-devnet-1: the mempool accepts it; the tx reaches the ante and is rejected for a normal proof-validation reason (codespace evmigration, code 1110). The mempool layer did not reject.
  • Prior session evidence (2026-06-15 walkthrough on this same devnet): I successfully ran migrate-validator.sh against val3, broadcast landed in ~1 block, validator restarted with ~30s downtime. migrate-validator, claim-legacy-account, and submit-proof all share the same runMigrationTx code path. If the mempool rejected zero-signer txs in production, val3 migration would have failed too — it didn't.

That means the production rejection I attributed to this bug in the rehearsal logs (migrate_multisig_submit_failed in /tmp/run-seed.jsonl) was almost certainly a different error* — the JSONL only captured a failure tag, not the stderr message. The 'tx must have at least one signer' string I have only reproduced in the in-process test path.

Going to:

  1. Run a real multisig submit-proof against this devnet right now and capture the actual stderr.
  2. Either confirm the production bug with a real-binary repro — and unblock this PR with that evidence in the body — or close it as a fix for a phantom that only manifests in the in-process test harness.

Apologies for the noise. Will report back within the hour.

@mateeullahmalik

Copy link
Copy Markdown
Contributor Author

✅ Confirmed on live binaries — both sides proven

Resolved the contradiction. The bug is real, the fix does work, and I have end-to-end proof on the live devnet.

What I did

  1. Built a real combined multisig tx via the full migrate-multisig.sh ceremony (generate → sign × 2 → combine) against seed_sale_3 on this local devnet, producing canonical tx.json with signer_infos: [] and signatures: [].
  2. Submitted via lumerad tx evmigration submit-proof against validators running unmodified master (v1.20.0-rc4-bba07f1d):
gas estimate: 187071
code: 1
codespace: undefined
raw_log: tx must have at least one signer
  1. Rebuilt the chain binary on this PR's HEAD (v1.20.0-rc4-c14cf42a), redeployed to all 5 validators, restarted, confirmed they're producing blocks on the fix binary.
  2. Submitted the identical tx.json:
gas estimate: 187071
code: 0
codespace: ""
height: 7999
events: ... claim_legacy_account { legacy: lumera1q669...8ugzpszu2, new: lumera1ttwd...m0xpg, ... }
  1. Verified migration-record on chain:
record:
  legacy_address: lumera1q669u520hc3w4xuly08w9w97uzlt08ugzpszu2
  migration_height: "7999"
  new_address: lumera1ttwdmmlqf8xu5mkufrh5zcck8v8yn42a5m0xpg

Why earlier walkthroughs didn't catch this

Our remote devnet at 3.133.124.60 — where I previously verified validator migration in a June 15 walkthrough and where migration-stats reports total_migrated: 80 — runs with max-txs = -1 in app.toml. That value short-circuits app/evm_mempool.go: configureEVMMempool at the very top:

if cosmosPoolMaxTx < 0 {
    logger.Debug("app-side mempool is disabled, skipping EVM mempool configuration")
    return nil
}

With the mempool disabled, the buggy default-signer-extractor code path is never reached, and CheckTx admits zero-signer migration txs straight to the ante chain. Local devnet uses the default max-txs = 5000 (which is what mainnet operators will use), and the bug fires.

Demoting back to ready

Restoring this PR to ready-for-review. Test plan in the body still applies. Verified on live chain in addition to the in-process tests.

@mateeullahmalik mateeullahmalik marked this pull request as ready for review June 18, 2026 22:49
…urity pin

Addresses Kay's review feedback on PR #167 ("need more tests for all that").

Three new tests replace the prior thin mempool.Insert direct call:

1. TestEVMMempool_CheckTxAcceptsZeroSignerMigrationTx
   Drives a valid zero-signer migration tx through the SAME app.CheckTx
   entry point that 'lumerad tx evmigration submit-proof' hits on live
   mainnet. Asserts response code 0 and that the log NEVER contains
   'at least one signer'. This is the production regression pin.

2. TestEVMMempool_CheckTxRejectsZeroSignerNonMigrationTx
   Security pin for the worry that the SignerExtractionAdapter widens
   the hole: submits a zero-signer banktypes.MsgSend through the same
   CheckTx entry point and asserts it is REJECTED. Proves the adapter
   only synthesizes signers for migration-only txs and that all other
   message types still require envelope signatures.

3. TestEVMigrationSignerAdapter_DefaultExtractor_PinsFailureMode
   Documents the upstream SDK behavior that necessitates the custom
   adapter — default extractor returns empty []SignerData on a
   zero-signer migration tx. If this ever changes upstream, we can
   remove the workaround.

Regression-pin verified locally by temporarily reverting app/evm_mempool.go
to master: test #1 fails with the exact production error, test #2 still
passes (confirming no widening), then restored.
@mateeullahmalik

Copy link
Copy Markdown
Contributor Author

@andkay Pushed test coverage you asked for in 236b7aa. Three new tests:

1. TestEVMMempool_CheckTxAcceptsZeroSignerMigrationTx — full app.CheckTx(RequestCheckTx) end-to-end, same entry point lumerad tx evmigration submit-proof hits on mainnet. Asserts code 0 and log does NOT contain at least one signer. This is the live regression pin for the production bug.

2. TestEVMMempool_CheckTxRejectsZeroSignerNonMigrationTx — security pin: submits a zero-signer banktypes.MsgSend through the same CheckTx path and asserts it's still rejected. Proves the adapter does NOT widen the hole — only migration-only txs get a synthetic signer; everything else still requires envelope sigs.

3. TestEVMigrationSignerAdapter_DefaultExtractor_PinsFailureMode — documents the upstream SDK behavior (default extractor returns empty slice on zero-signer migration tx) that necessitates the workaround. Canary if upstream ever fixes this.

Regression-pin verified by reverting: locally swapped app/evm_mempool.go back to master, ran the suite — test #1 failed with the exact production error tx must have at least one signer, test #2 still passed (SDK rejects non-migration zero-signer txs independent of our adapter). Then restored.

All app tests green: go test -tags=test ./app/ -count=1ok 14.958s.

Bats coverage for the script-level submit-proof path against a real binary is a bigger change — happy to do it in a follow-up PR so we can land this fix in time for testnet/mainnet windows.

akobrin1 and others added 3 commits June 18, 2026 21:18
… + harden mempool tests

Admitting zero-signer migration txs to the app mempool (the signer-extraction
adapter) also opened a zero-fee spam vector: migration txs carry no fee and no
envelope signature, so anyone could flood the mempool/proposals with proof-valid
txs that only fail at message execution. Enforce the migration admission window
at the ante so these are rejected before mempool insertion.

- x/evmigration/keeper/ante.go: VerifyMigrationProofsForAnte now rejects with
  ErrMigrationDisabled / ErrMigrationWindowClosed when EnableMigration is off or
  MigrationEndTime has passed (mirrors preChecks steps 1-2). Single param read,
  no per-account state; no-op under default params (enabled, no deadline). On
  mainnet a concrete MigrationEndTime bounds the exposure to the migration
  window and closes it automatically. Message execution still re-checks.
- ante_test.go: TestVerifyMigrationProofsForAnte_AdmissionGate (disabled /
  window-closed / open-window).

Review hardening of the mempool test suite:
- Renamed TestEVMigrationMalformedLegacyAddressRejected* -> ...ByValidateBasic
  and documented that it pins the ante ValidateBasic layer, not the adapter.
- Made CheckTxRejectsZeroSignerNonMigrationTx assert the rejecting layer
  ("no signatures supplied") instead of just code != 0, and added
  InsertRejectsZeroSignerNonMigrationTx as the true adapter-layer pin (drives
  mempool.Insert directly, bypassing the ante).
- Documented the gas==0 div-by-zero hardening in defaultCosmosPoolConfig.
- Track the previously-untracked real-node integration test so the branch
  matches the docs/CHANGELOG references.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…talog

The adapter rows in unit-evmigration.md referenced names that no longer match
the actual test functions in app/evmigration_signer_extraction_adapter_test.go:
  _ClaimLegacyAccount            -> _MigrationOnlyTx_SyntheticSigner
  _MigrateValidator              -> _MigrationOnlyTx_MigrateValidator
  _NonMigration_DelegatesToFallback -> _NonMigrationTx_DelegatesToFallback
  _InvalidLegacyAddress_Rejected -> _InvalidBech32_Rejected
and the _NilFallback_FallsBackToDefault test was missing entirely. Names now
match the source 1:1.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…+ align tests/docs

Builds on the migration-window gate by adding the cheap state-plausibility
checks to the ante, so proof-valid-but-impossible zero-fee migration txs are
rejected before mempool admission rather than only at message execution. This
closes the in-window leg of the zero-fee mempool-spam vector: a fabricated
keypair has no on-chain legacy account, so it is now rejected at the ante.

Implementation (x/evmigration/keeper/ante.go):
- VerifyMigrationProofsForAnte now runs verifyMigrationAdmissionState after the
  enable/window gate and before proof verification. It mirrors the cheap subset
  of msgServer.preChecks: addresses-differ, source-not-already-migrated,
  new-address-not-a-migrated-legacy, destination-not-reused, legacy-account
  exists and is not a module account, and (for MsgMigrateValidator) the source
  is a validator operator. The per-block rate limit is intentionally omitted
  from the ante (it is block-stateful and belongs only at execution).
- Ordering matters: state checks run before proof verification, so an invalid
  proof on a nonexistent account surfaces the state error first. The keeper
  still re-checks everything at execution; the ante is a best-effort mempool
  filter.

Tests:
- x/evmigration/keeper/ante_test.go: TestVerifyMigrationProofsForAnte_AdmissionGate
  (disabled / window-closed / open-window) and _CheapStateAdmission (nonexistent
  legacy, already-migrated, reused destination, non-validator source), with mock
  expectations pinning the check ordering. Existing subtests seed the legacy
  account so they exercise the proof paths they intend to.
- app/evm_mempool_evmigration_test.go: seed the legacy account into the check-tx
  state (NewContext(true) — the state CheckTx reads; NewContext(false) targets
  finalizeBlockState and is invisible to CheckTx) using NewAccountWithAddress to
  assign a fresh account number; PrepareProposal test uses a genesis-seeded
  legacy account so the proposal-time ante verify passes.
- app/evm/ante_evmigration_fee_test.go: add seedLegacyAccountInCtx and seed the
  legacy account in the accept / invalid-proof / CheckTx cases so the state gate
  passes and the proof-rejection path is what is actually asserted.
- tests/integration/evm/mempool/evmigration_zero_signer_test.go: seed the legacy
  account into the node genesis before broadcasting (real-node path).

Docs:
- docs/.../tests/unit-evmigration.md: remap 27 stale signature/multisig test
  names to the current functions (TestVerifyCosmosSecp256k1_* /
  TestVerifyEthSecp256k1_* for signature verification, TestVerifyMigrationProof_
  NewSide_Multisig_* for the multisig verifier, NonSecp256k1SubKey,
  MigrationProof_ValidateBasic_Dispatch). Every referenced name now resolves to
  a real func.
- docs/.../tests.md: correct counters to actual values — EVMigration keeper
  118+ -> 124+, EVMigration integration "15+ core" -> "14 core + 4 mempool
  broadcast regressions" (18 rows), comparison row 117+/19 -> 150+/18, totals
  Unit ~401/Int ~151/Total ~564 -> ~407/~150/~569, headline ~560 -> ~570.
- CHANGELOG.md, bugs.md, integration-{evmigration,mempool}.md: describe the
  state-plausibility checks and the new negative tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Enables Lumera’s app-side EVM mempool (ExperimentalEVMMempool) to accept intentionally zero-signer EVM migration transactions (MsgClaimLegacyAccount, MsgMigrateValidator) by synthesizing deterministic signer data from legacy_address, so the migration-aware ante chain can run as designed.

Changes:

  • Add an evmigration-aware SignerExtractionAdapter and wire it into the Cosmos-side mempool config and proposal signer extractor path.
  • Harden x/evmigration ante admission to bound zero-fee/zero-sig migration traffic to the configured migration window and reject obviously-impossible migrations early.
  • Add unit + integration regression coverage (including real-node broadcast_tx_sync) and update migration/testing docs + changelog.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
x/evmigration/keeper/ante.go Adds migration window gating + cheap state admission checks in ante for unsigned, zero-fee migration txs.
x/evmigration/keeper/ante_test.go Extends keeper ante tests to cover the new admission gate and state plausibility checks.
tests/integration/evm/mempool/evmigration_zero_signer_test.go Real-node integration coverage ensuring zero-signer migration broadcasts pass CheckTx while non-migration zero-signer txs still fail.
docs/evm-integration/user-guides/migration.md Documents the zero-signer mempool requirement and points to the adapter wiring.
docs/evm-integration/testing/tests/unit-evmigration.md Updates/expands test catalog to include new mempool signer-adapter and ante-admission tests.
docs/evm-integration/testing/tests/unit-app-wiring.md Adds a test entry documenting mempool-disabled wiring behavior.
docs/evm-integration/testing/tests/integration-mempool.md Registers the new integration test file and its scenarios.
docs/evm-integration/testing/tests/integration-evmigration.md Cross-links the new broadcast-path coverage in the mempool suite.
docs/evm-integration/testing/tests.md Updates testing totals and coverage matrix to reflect new tests.
docs/evm-integration/testing/bugs.md Records the bug, root cause, fix, hardening, and test coverage references.
docs/evm-integration/architecture/rollout.md Updates approximate test counts used in rollout readiness documentation.
CHANGELOG.md Adds release notes for the mempool signer adapter and ante hardening changes.
app/test_helpers.go Introduces setupWithAppOptionOverrides to allow app option overrides in tests.
app/evmigration_signer_extraction_adapter.go New adapter: synthesizes a signer from legacy_address for migration-only txs, delegates otherwise.
app/evmigration_signer_extraction_adapter_test.go Unit tests for synthetic signer derivation, delegation, and invalid legacy address cases.
app/evm/ante_evmigration_fee_test.go Updates ante tests to seed legacy accounts so new ante admission checks don’t mask proof verification assertions.
app/evm_mempool.go Wires the migration-aware signer adapter into CosmosPoolConfig and proposal signer extraction; replicates upstream priority logic with a zero-fee short-circuit.
app/evm_mempool_test.go Adds coverage ensuring mempool.max-txs = -1 leaves app-side mempool disabled (BaseApp stays NoOp).
app/evm_mempool_evmigration_test.go App-level regression tests asserting CheckTx/mempool/proposal paths accept valid zero-signer migration txs and preserve rejection of non-migration zero-signer txs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread app/evmigration_signer_extraction_adapter.go
@mateeullahmalik mateeullahmalik merged commit 4352f7d into master Jun 19, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants