scripts: add migrate-batch.sh — batch driver for EVM migration#163
scripts: add migrate-batch.sh — batch driver for EVM migration#163mateeullahmalik wants to merge 5 commits into
Conversation
A single-process driver for migrating many legacy accounts to their
EVM-compatible counterparts. Wraps the existing migrate-multisig.sh and
migrate-account.sh scripts with a lifecycle manager that:
1. parses an operator-supplied mnemonics file (legacy + multisig topology),
2. probes each target's on-chain state (migrated / ready / needs-pubkey /
needs-funding),
3. sets up a per-target ephemeral keyring (mode 0700, trap-cleanup on EXIT),
4. imports each signer's mnemonic as both legacy (118/secp256k1) and new
(60/eth_secp256k1) keyring entries,
5. reconstructs the legacy multisig in the ephemeral keyring and asserts
its derived address equals the address in the mnemonics file,
6. tops up the legacy account from --funder if balance==0,
7. self-sends to publish the multisig pubkey on chain (if missing),
8. delegates the migration ceremony to migrate-multisig.sh / migrate-account.sh,
9. verifies via evmigration migration-record.
Signer order in each multisig's public_keys array is significant and is
matched by pubkey equality, never by name suffix — non-sequential signer
orderings are common and silent ordering mistakes would derive the wrong
multisig address.
Subcommands:
report — offline classification, no chain calls
status — read-only per-target on-chain state probe
execute — full migration lifecycle, with --dry-run, --target, --funder,
--top-up-amount, --continue-on-error
Reuses helpers from scripts/evmigration-common.sh (import_from_mnemonic,
auth_pubkey_type, lumerad_q, wait_for_tx, assert_broadcast_accepted,
require_multisig_binary, resolve_chain_id) — no duplication of migration
logic.
Mnemonics never enter the operator's main keyring; per-signer mnemonics are
written to mode-0600 temp files inside the ephemeral keyring dir and removed
immediately after import_from_mnemonic consumes them.
See scripts/migrate-batch.md for the full operator workflow.
Six concrete fixes from a mainnet-gate review of the initial commit:
SEV-1 — would have broken first real run on testnet:
* B1: every delegated sub-script call inside _mb_execute_one is now wrapped
with an explicit `if ! ... ; then return 1; fi`. Previously the unguarded
calls to migrate-multisig.sh {generate,sign,combine,submit} and
migrate-account.sh would let `set -e` kill the whole batch on the first
non-zero exit, silently defeating --continue-on-error and giving the
operator no idea how many targets had been migrated.
* B2: _mb_classify_target no longer routes 'account not found on auth' to
the 'unknown' bucket. The previous implementation called auth_pubkey_type,
which hard-exits (exit 2) when the auth account doesn't exist — a state
that is COMPLETELY NORMAL for a fresh foundation account that has never
transacted. Every such target was being misclassified as 'unknown' (= RPC
failure), so the operator would have seen 'RPC down' for the entire
foundation set on first run. New implementation probes auth/bank directly,
uses bank-balances as the RPC liveness probe, and distinguishes:
- acct on auth, has pubkey → ready
- acct on auth or balance > 0, no pubkey → needs-pubkey
- acct not on auth and balance = 0 → needs-funding
- bank query itself failed → unknown (real RPC issue)
* B3: _mb_send_with_funder used `${VAR:+--flag "$VAR"}` parameter
expansion to optionally inject --keyring-dir / --home. The script runs
with IFS=$'\n\t' set at the top, under which that pattern does NOT
word-split on spaces and produces a single mangled argv element like
'--keyring-dir /home/foo' which lumerad rejects. Reworked to use an
explicit array (funder_extra=()) so word boundaries are preserved
regardless of IFS. Comment in source explains why this MUST NOT be
collapsed back to parameter expansion.
SEV-2 — mainnet-gate items:
* B4: mainnet chain-id allowlist. execute now refuses chain-ids matching
lumera-mainnet* or lumera-1 unless the operator explicitly sets
LUMERA_BATCH_MAINNET_OK=i-understand. Adds a log_warn on the override
path so the safety bypass shows up in any captured log.
* B5: confirmation prompt now prints the full numbered target list (kind,
name, address) before asking for confirmation, so the operator can
sanity-check WHICH targets — not just how many — are about to be touched.
* B6: dropped `2>&1` from the two remaining tx-broadcast captures (funder
send, multisig self-send broadcast, single-sig self-send broadcast).
Merging stderr into the JSON-capture variable could feed garbage into
assert_broadcast_accepted if lumerad emits any progress/warning text
before the JSON. Matches the discipline already in evmigration-common.sh's
lumerad_tx helper, which deliberately doesn't merge.
SEV-3 doc nit:
* B8: README claimed --funder-keyring-dir default was '~/.lumera'. It's
actually empty (lumerad picks the default). Fixed.
Follow-ups intentionally left for a separate PR:
- persistent JSONL run log (--log-file) for audit trail on long batches
- bats unit tests for the report subcommand
- per-target confirmation prompt (`--ask`) on top of the batch confirm
Verified:
- bash -n + shellcheck -x -e SC1091,SC2034 clean
- report against the real foundation file still matches Python dry-run
(28 multisigs, 3 standalones, 31 targets, zero unresolved)
- mainnet guard correctly refuses lumera-mainnet-1 with exit 1, and
correctly proceeds with WARN under LUMERA_BATCH_MAINNET_OK=i-understand
- dry-run shows the numbered target list before the confirm prompt
Mainnet-gate review round 1 — applied fixesPushed SEV-1 (would have broken first real run on testnet)
SEV-2 (mainnet-gate items)
SEV-3 doc nit
Verified
Deferred to follow-up PRs (intentionally not bundled — keep this PR scoped)
Will file these as separate issues once this PR lands. |
Append-only JSONL log of every lifecycle milestone, correlated by a
per-run batch_id. Useful for:
- post-mortem of long batches (which targets ran, which succeeded,
which failed, why)
- mainnet audit trail (the script broadcasts real txs; operators
should be able to grep what happened weeks later)
- resuming after partial failure (re-run is idempotent via
chain-state classification; the log lets the operator confirm
which were already migrated)
Events emitted (each is one JSON object on its own line):
batch_start ts, batch_id, chain_id, node, target_count, funder,
top_up_amount, dry_run
target_start ts, batch_id, target, kind, address
classify status (migrated/ready/needs-pubkey/needs-funding/unknown),
balance
keyring_setup ephemeral_dir
reconstructed legacy_address (+ new_address for multisig)
funding_start funder, amount
funding_done
self_send_start mode (multisig/single-sig)
self_send_done
ceremony_start path (multisig/single_sig), threshold (for multisig)
target_done outcome (success/skipped_already_migrated/dry_run_complete/
failed), plus reason on failure (rpc_unknown,
needs_funder_not_provided, legacy_multisig_address_mismatch,
funding_failed, *_self_send_*, migrate_multisig_*_failed,
migrate_account_failed, post_check_*); plus new_address on
success
batch_done succeeded, failed, remaining
Implementation notes:
* _mb_log_event is a no-op when --log-file is empty. Existing call sites
pay nothing if the operator does not pass the flag.
* Uses jq -nc to build each record so operator-supplied strings
(addresses, error reasons) are properly JSON-escaped. Do NOT replace
with hand-rolled printf — Anti-pattern: I started there and caught a
case where an error reason containing a quote would have broken the
line.
* batch_id is 8 bytes of /dev/urandom (16 hex chars), with a $$/timestamp
fallback. Two batches starting in the same second still get distinct
IDs, so grep-by-batch is unambiguous on the same log file.
* --log-file path is resolved to absolute at execute() entry so a later
cd by lumerad cannot redirect appends elsewhere.
* File is created mode 0600 via umask 0177 — addresses + tx hashes are
not secret but mnemonic-handling discipline says default-narrow.
* Log writes are best-effort: jq write failure logs a warning and
continues. We do NOT abort the batch on log IO failure — losing one
audit line is preferable to losing a broadcast in progress.
README updated to document the flag.
Verified:
- bash -n + shellcheck -x -e SC1091,SC2034 clean
- end-to-end dry-run smoke produces well-formed JSONL with 5 events
(batch_start, target_start, classify, target_done, batch_done) and
all events share the same batch_id
- file is mode 0600 on create
10 test cases covering the offline classification path. `report` does
not touch the chain so these tests need no lumerad stub — only `jq`.
Coverage:
- simple 1-multisig fixture: totals (multis=1, standalones=0, targets=1)
- signer order matched by pubkey, NOT by name suffix
This is the single most important correctness invariant in the
whole driver. The fixture deliberately reorders public_keys so
that name-suffix-based sorting would produce the WRONG order, and
the test asserts the driver produces the canonical pubkey order.
23 of 28 real foundation multisigs have non-sequential signer
orderings; any regression here would silently derive the wrong
multisig address.
- unreferenced local entry → standalone single-sig migration target
- rejects non-object top-level JSON (exit 9)
- rejects entry with unknown type field (exit 9)
- rejects entry missing address field (exit 9)
- rejects multisig with unknown signer pubkey (exit 9), and the
error message names BOTH the offending multisig and the missing
pubkey so the operator can fix the file directly
- missing --mnemonics is exit 1 (usage)
- --mnemonics path that does not exist is exit 1
- --plan-out produces a parseable JSON object with a targets[] array
Verified:
bats tests/scripts/migrate-batch.bats → 10/10 green
bats tests/scripts/chain-helper.bats → unchanged (no regression)
Mainnet-gate round 2 —
|
| event | extra fields |
|---|---|
batch_start |
chain_id, node, target_count, funder, top_up_amount, dry_run |
target_start |
target, kind, address |
classify |
status, balance |
keyring_setup |
ephemeral_dir |
reconstructed |
legacy_address (+ new_address for multisig) |
funding_start / funding_done |
funder, amount |
self_send_start / self_send_done |
mode |
ceremony_start |
path, threshold |
target_done |
outcome (success / skipped_already_migrated / dry_run_complete / failed); reason and other fields on failure |
batch_done |
succeeded, failed, remaining |
Implementation notes:
_mb_log_eventis a no-op when--log-fileis empty — zero cost when operator does not pass the flag- Uses
jq -ncto build each record so operator-supplied strings (addresses, error reasons) are properly JSON-escaped batch_idis 8 bytes of/dev/urandom(16 hex chars), with$$/timestampfallback — two batches starting in the same second still get distinct IDs- Path resolved to absolute at execute() entry so a later
cdcannot redirect appends - File created mode 0600 via
umask 0177 - Log writes are best-effort: a write failure logs a warning and continues. We do NOT abort the batch on log IO failure — losing one audit line is preferable to losing a broadcast in progress.
Smoke test (one target, dry-run, RPC absent so classify returns unknown):
{"ts":"...","batch_id":"42b0e4bfbaf8072d","event":"batch_start","chain_id":"lumera-testnet-2","node":"tcp://localhost:26657","target_count":"1","funder":"","top_up_amount":"100000ulume","dry_run":"1"}
{"ts":"...","batch_id":"42b0e4bfbaf8072d","event":"target_start","target":"seed_sale_1","kind":"multisig","address":"lumera1t7akg..."}
{"ts":"...","batch_id":"42b0e4bfbaf8072d","event":"classify","target":"seed_sale_1","status":"unknown","balance":"0"}
{"ts":"...","batch_id":"42b0e4bfbaf8072d","event":"target_done","target":"seed_sale_1","outcome":"failed","reason":"rpc_unknown"}
{"ts":"...","batch_id":"42b0e4bfbaf8072d","event":"batch_done","succeeded":"0","failed":"1","remaining":"0"}2. bats coverage for report (bba07f1d)
10 test cases at tests/scripts/migrate-batch.bats. report doesn't touch the chain so the tests need no lumerad stub — only jq.
Coverage:
- Simple 1-multisig fixture totals
- Signer order matched by pubkey, NOT by name suffix — the most important correctness invariant in the driver. Fixture deliberately permutes
public_keysso name-suffix sorting would produce the wrong order; test asserts canonical pubkey-order is preserved. 23 of 28 real foundation multisigs have non-sequential signer orderings; any regression here would silently derive the wrong multisig address. - Unreferenced local entry → standalone single-sig target
- Rejects non-object top-level JSON (exit 9)
- Rejects entry with unknown type field (exit 9)
- Rejects entry missing address field (exit 9)
- Rejects multisig with unknown signer pubkey (exit 9), error message names BOTH the offending multisig and the missing pubkey
- Missing
--mnemonicsis exit 1 (usage) --mnemonicspath that does not exist is exit 1--plan-outproduces a parseable JSON object with atargets[]array
$ bats tests/scripts/migrate-batch.bats
1..10
ok 1..10
tests/scripts/chain-helper.bats still passes (no regression).
Remaining deferred item from previous comment:
3. --ask per-target confirmation prompt — will file as a follow-up issue.
…te-batch.sh execute live test)
Both bugs were caught during a full devnet bring-up + execute pass with
the real foundation file. Pre-PR review never could have caught them
because they only surface against the actual lumerad CLI.
B-DEVNET-1: `tx bank send` wants a KEY NAME, not an address.
_mb_multisig_self_send called:
"$BIN" tx bank send "$multi_addr" "$multi_addr" ...
but the first positional is parsed by lumerad as 'name OR address of
signing key from keyring', and an address that is NOT a registered key
is rejected:
no key name or address provided; have you forgotten the --from flag?
EOF: tx parse error
The fix is to pass "$multi_name" (the keyring entry name we create in
_mb_add_multisig), not "$multi_addr". Inline comment added.
B-DEVNET-2: `keys add --multisig` re-orders sub-keys by ADDRESS by default.
lumerad CLI documents this:
"The keys are sorted by address, unless the flag --nosort is set."
Without --nosort, _mb_add_multisig silently produces a multisig whose
public_keys[] are NOT in the operator-supplied order. Two consequences:
1. The reconstructed LEGACY multisig address may not match the file
address (caught by our existing assertion — false-negative on
targets where CLI input order happens to coincide with
address-sort, false-positive otherwise).
2. Worse: the NEW multisig's public_keys[] end up in a different
order than the legacy multisig's public_keys[], because the
address-sort key is different for secp256k1 (legacy) vs
eth_secp256k1 (new) pubkeys. `migrate-multisig.sh sign` then
refuses with:
legacy key 'legacy-X' is signer index 0, but new key 'new-X'
is signer index 1; multisig migration requires the same signer
position to approve both halves
The fix: pass --nosort on every `keys add --multisig` call. The
output.json file's public_keys[] order is canonical; we must preserve
it on BOTH the legacy and the new reconstruction. Long-form comment
added to _mb_add_multisig explaining why this MUST NOT be removed.
This bug also exists in the operator MD file Alexey shared; the team
should patch that doc too. Mainnet operators following the doc by
hand will hit it.
Devnet verification of the fixes (chain: lumera-devnet-1, lumerad
v1.20.0-rc4):
- status correctly classifies pre-funded targets as 'needs-pubkey'
and unfunded targets as 'needs-funding' (B2 fix from earlier review
rounds, re-verified live)
- execute --dry-run reconstructs seed_sale_1 multisig and asserts
address equality (canonical pubkey-order matching path)
- execute on pre-funded seed_sale_1: self-send to publish pubkey
succeeds; balance drops 500k -> 495k as expected
- execute on unfunded seed_sale_3: funder send succeeds (B3 funder
IFS array-form fix re-verified live), self-send succeeds, generate
+ sign x K + combine all succeed end-to-end
- One known-failure remaining is on the CHAIN, not the script:
`tx evmigration submit-proof` is rejected at CheckTx with
'tx must have at least one signer' (code=1). The submit-proof CLI
has no --from flag (by design — the migration auth lives in the
proof bytes), but the SDK ante still requires a signer. Filed as
follow-up — out of scope for this script.
Findings will be posted to PR #163 with full reproduction.
Devnet validation — end-to-end live test, two real bugs caught, one chain-side findingSpun up a local devnet at Test matrix
Bugs found and fixed in
|
Root-cause update —
|
| Layer | Status |
|---|---|
migrate-batch.sh script logic |
✅ Devnet-validated through combine. All 6 SEV-1/2 bugs from this PR's review rounds + 2 devnet-found bugs (B-DEVNET-1, B-DEVNET-2) fixed. |
migrate-multisig.sh ceremony |
✅ Generates, signs, combines correctly. Produces the same canonical zero-signer shape as the repo's own combined-tx.json fixture. |
| Chain (CheckTx / mempool) | ❌ Chain bug. EVM mempool rejects zero-signer migration txs before the migration-aware ante runs. |
| MD operator doc | sign step needs --from in addition to --new-key, otherwise partials are one-sided and combine won't meet quorum. Also needs --nosort on keys add --multisig (B-DEVNET-2). |
Will hold this PR in draft until the chain-side mempool fix lands. The script side is unblocked the moment a node with the mempool patch is reachable.
|
Opened the chain-side fix in #167. With that merged + an rc5 cut, this batch driver can be re-validated end-to-end on devnet and flipped to ready-for-review. #167 includes:
|
There was a problem hiding this comment.
Pull request overview
Adds an operator-facing batch driver for Lumera EVM migration that orchestrates many single-sig and multisig migrations in one run, reusing existing migrate-multisig.sh / migrate-account.sh logic and common helpers.
Changes:
- Introduces
scripts/migrate-batch.shwithreport,status, andexecutesubcommands plus optional JSONL audit logging. - Adds an operator README (
scripts/migrate-batch.md) describing format, workflow, and safety properties. - Adds Bats coverage for the offline
reportclassifier (tests/scripts/migrate-batch.bats).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 12 comments.
| File | Description |
|---|---|
scripts/migrate-batch.sh |
New batch driver implementing plan parsing/classification, per-target status probing, and full execution lifecycle using an ephemeral keyring. |
scripts/migrate-batch.md |
Operator documentation for the batch driver workflow, flags, and safety guardrails. |
tests/scripts/migrate-batch.bats |
Unit tests for the offline report plan builder/classifier (no chain calls). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| local bad | ||
| bad=$(jq -r ' | ||
| to_entries | ||
| | map(select( | ||
| (.value | type) != "object" | ||
| or (.value.type // "" | (. != "local" and . != "multi")) | ||
| or (.value.address // "" | startswith("lumera1") | not) | ||
| or ((.value.pubkey // "") == "") | ||
| )) | ||
| | map(.key) | join(", ") | ||
| ' "$mfile") | ||
| if [[ -n "$bad" ]]; then | ||
| log_error "structurally invalid entries: $bad" | ||
| exit 9 | ||
| fi |
| if [[ -z "$mnemonics_file" ]]; then | ||
| log_error "status: --mnemonics is required"; exit 1 | ||
| fi | ||
| require_jq | ||
| # shellcheck disable=SC2034 |
| if [[ -z "$mnemonics_file" ]]; then | ||
| log_error "execute: --mnemonics is required"; exit 1 | ||
| fi | ||
| require_jq | ||
| # shellcheck disable=SC2034 |
| local yes=0 | ||
| _MB_FUNDER="" | ||
| _MB_TOP_UP_AMOUNT="100000ulume" | ||
| _MB_FUNDER_KEYRING_BACKEND="test" | ||
| _MB_FUNDER_KEYRING_DIR="" | ||
| _MB_FUNDER_HOME="" |
| with zero balance. Lives in the OPERATOR's main | ||
| keyring (NOT the ephemeral one). | ||
| --top-up-amount <coins> Amount to send to a zero-balance target before | ||
| self-send. Default: 100000ulume. | ||
| --funder-keyring-* How to reach the funder key (defaults: backend=test, | ||
| dir=$HOME default). | ||
| --log-file <path> Append JSONL audit records (one event per line) |
| | `--target <name>` | Process only the named target. Use this for the first run. | | ||
| | `--funder <key>` | Operator-keyring key that pays fees for zero-balance targets. | | ||
| | `--top-up-amount <coins>` | How much to send to a zero-balance target. Default `100000ulume`. | | ||
| | `--funder-keyring-{backend,dir,home}` | How to reach the funder key. Defaults: `test` backend, lumerad's default home / keyring dir. | | ||
| | `--log-file <path>` | Append one JSONL audit record per lifecycle milestone (batch_start, target_start, classify, keyring_setup, reconstructed, funding_*, self_send_*, ceremony_start, target_done, batch_done). Mode 0600 on create, append-only, correlated by per-run `batch_id`. Operator handles rotation. | |
| ./scripts/migrate-batch.sh execute \ | ||
| --mnemonics output.json \ | ||
| --target seed_sale_1 \ | ||
| --chain-id lumera-testnet-2 \ | ||
| --funder ops-funder --top-up-amount 100000ulume \ | ||
| --dry-run |
| @test "report: rejects entry with unknown type (exit 9)" { | ||
| local fix="$TMPDIR/fix.json" | ||
| cat >"$fix" <<'JSON' | ||
| { | ||
| "broken": { | ||
| "address": "lumera1broken", | ||
| "mnemonic": "", | ||
| "pubkey": "{}", | ||
| "type": "weird-thing" | ||
| } | ||
| } | ||
| JSON | ||
|
|
||
| run "$MIGRATE_BATCH" report --mnemonics "$fix" | ||
| [ "$status" -eq 9 ] | ||
| } |
| # 4. Execute one target for real, then expand. | ||
| ./scripts/migrate-batch.sh execute \ | ||
| --mnemonics output.json \ | ||
| --target seed_sale_1 \ | ||
| --chain-id lumera-testnet-2 \ | ||
| --funder ops-funder --top-up-amount 100000ulume |
| # 5. Full batch (with confirmation prompt). | ||
| ./scripts/migrate-batch.sh execute \ | ||
| --mnemonics output.json \ | ||
| --chain-id lumera-testnet-2 \ | ||
| --funder ops-funder --top-up-amount 100000ulume |
* fix(evmigration): mempool-accept zero-signer migration txs The cosmos/evm ExperimentalEVMMempool routes non-EVM txs through a PriorityNonceMempool that, by default, uses DefaultSignerExtractionAdapter. That adapter calls tx.GetSignaturesV2() and refuses any tx with an empty signature set, returning 'tx must have at least one signer' from PriorityNonceMempool.Insert. MsgClaimLegacyAccount and MsgMigrateValidator are zero-signer by design: authorization lives in the proof bytes, fees are waived by EVMigrationFeeDecorator, and the migration-aware ante chain (app/evm/ante.go: migrationCosmosAnte) accepts the shape. That ante chain, however, runs AFTER mempool insert. Without a migration-aware signer extractor, every submit-proof broadcast is rejected at the mempool layer before ante ever sees it -- including the canonical combined-tx.json shape produced by the offline multisig flow. This change: * Adds app/evmigration_signer_extraction_adapter.go: a SignerExtractionAdapter that returns a synthetic SignerData built from the message's legacy_address for IsEVMigrationOnlyTx, and delegates everything else to a fallback (default for the Cosmos pool, EVM-aware for proposal building). * Wires the adapter into ExperimentalEVMMempool.CosmosPoolConfig and into NewDefaultProposalHandler's signer extraction adapter so it applies on both Insert and PrepareProposal paths. * Replicates upstream's default PriorityNonceMempoolConfig (priority by gas-price) locally so the adapter override is the only behavior change. Short-circuits priority calc for zero-fee/zero-gas txs so it doesn't touch EVM keeper state for migration txs. Tests: * app/evmigration_signer_extraction_adapter_test.go: 7 unit tests pinning synthetic-signer derivation, fallback delegation for non-migration and mixed txs, empty/invalid legacy_address rejection, and nil-fallback safety. * app/evm_mempool_evmigration_test.go: 2 integration tests on the real App + real ExperimentalEVMMempool. One asserts Insert accepts a zero-signer MsgClaimLegacyAccount and CountTx() increments. The other pins the regression: the SDK default adapter still returns zero signers for the same tx, which is precisely what makes PriorityNonceMempool reject without this fix. Docs: * CHANGELOG.md entry under v1.20.0 explaining the fix. * docs/evm-integration/user-guides/migration.md zero-signer-submit callout updated to point at the adapter file. Discovered during v1.20.0-rc4 multisig migration rehearsal (PR #163 migrate-batch.sh end-to-end). Reproduces with migrate-multisig.sh submit, migrate-account.sh, and hand-built lumerad tx broadcast. Verified: go test -tags=test ./app/... green, app package tests pass (14.8s). * test(evmigration): strengthen mempool tests — full CheckTx path + security pin Addresses Kay's review feedback on PR #167 ("need more tests for all that"). Three new tests replace the prior thin mempool.Insert direct call: 1. TestEVMMempool_CheckTxAcceptsZeroSignerMigrationTx Drives a valid zero-signer migration tx through the SAME app.CheckTx entry point that 'lumerad tx evmigration submit-proof' hits on live mainnet. Asserts response code 0 and that the log NEVER contains 'at least one signer'. This is the production regression pin. 2. TestEVMMempool_CheckTxRejectsZeroSignerNonMigrationTx Security pin for the worry that the SignerExtractionAdapter widens the hole: submits a zero-signer banktypes.MsgSend through the same CheckTx entry point and asserts it is REJECTED. Proves the adapter only synthesizes signers for migration-only txs and that all other message types still require envelope signatures. 3. TestEVMigrationSignerAdapter_DefaultExtractor_PinsFailureMode Documents the upstream SDK behavior that necessitates the custom adapter — default extractor returns empty []SignerData on a zero-signer migration tx. If this ever changes upstream, we can remove the workaround. Regression-pin verified locally by temporarily reverting app/evm_mempool.go to master: test #1 fails with the exact production error, test #2 still passes (confirming no widening), then restored. * fix(evmigration): gate zero-fee migration txs to the admission window + harden mempool tests Admitting zero-signer migration txs to the app mempool (the signer-extraction adapter) also opened a zero-fee spam vector: migration txs carry no fee and no envelope signature, so anyone could flood the mempool/proposals with proof-valid txs that only fail at message execution. Enforce the migration admission window at the ante so these are rejected before mempool insertion. - x/evmigration/keeper/ante.go: VerifyMigrationProofsForAnte now rejects with ErrMigrationDisabled / ErrMigrationWindowClosed when EnableMigration is off or MigrationEndTime has passed (mirrors preChecks steps 1-2). Single param read, no per-account state; no-op under default params (enabled, no deadline). On mainnet a concrete MigrationEndTime bounds the exposure to the migration window and closes it automatically. Message execution still re-checks. - ante_test.go: TestVerifyMigrationProofsForAnte_AdmissionGate (disabled / window-closed / open-window). Review hardening of the mempool test suite: - Renamed TestEVMigrationMalformedLegacyAddressRejected* -> ...ByValidateBasic and documented that it pins the ante ValidateBasic layer, not the adapter. - Made CheckTxRejectsZeroSignerNonMigrationTx assert the rejecting layer ("no signatures supplied") instead of just code != 0, and added InsertRejectsZeroSignerNonMigrationTx as the true adapter-layer pin (drives mempool.Insert directly, bypassing the ante). - Documented the gas==0 div-by-zero hardening in defaultCosmosPoolConfig. - Track the previously-untracked real-node integration test so the branch matches the docs/CHANGELOG references. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(evmigration): correct stale signer-adapter test names in test catalog The adapter rows in unit-evmigration.md referenced names that no longer match the actual test functions in app/evmigration_signer_extraction_adapter_test.go: _ClaimLegacyAccount -> _MigrationOnlyTx_SyntheticSigner _MigrateValidator -> _MigrationOnlyTx_MigrateValidator _NonMigration_DelegatesToFallback -> _NonMigrationTx_DelegatesToFallback _InvalidLegacyAddress_Rejected -> _InvalidBech32_Rejected and the _NilFallback_FallsBackToDefault test was missing entirely. Names now match the source 1:1. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(evmigration): extend ante admission gate with cheap state checks + align tests/docs Builds on the migration-window gate by adding the cheap state-plausibility checks to the ante, so proof-valid-but-impossible zero-fee migration txs are rejected before mempool admission rather than only at message execution. This closes the in-window leg of the zero-fee mempool-spam vector: a fabricated keypair has no on-chain legacy account, so it is now rejected at the ante. Implementation (x/evmigration/keeper/ante.go): - VerifyMigrationProofsForAnte now runs verifyMigrationAdmissionState after the enable/window gate and before proof verification. It mirrors the cheap subset of msgServer.preChecks: addresses-differ, source-not-already-migrated, new-address-not-a-migrated-legacy, destination-not-reused, legacy-account exists and is not a module account, and (for MsgMigrateValidator) the source is a validator operator. The per-block rate limit is intentionally omitted from the ante (it is block-stateful and belongs only at execution). - Ordering matters: state checks run before proof verification, so an invalid proof on a nonexistent account surfaces the state error first. The keeper still re-checks everything at execution; the ante is a best-effort mempool filter. Tests: - x/evmigration/keeper/ante_test.go: TestVerifyMigrationProofsForAnte_AdmissionGate (disabled / window-closed / open-window) and _CheapStateAdmission (nonexistent legacy, already-migrated, reused destination, non-validator source), with mock expectations pinning the check ordering. Existing subtests seed the legacy account so they exercise the proof paths they intend to. - app/evm_mempool_evmigration_test.go: seed the legacy account into the check-tx state (NewContext(true) — the state CheckTx reads; NewContext(false) targets finalizeBlockState and is invisible to CheckTx) using NewAccountWithAddress to assign a fresh account number; PrepareProposal test uses a genesis-seeded legacy account so the proposal-time ante verify passes. - app/evm/ante_evmigration_fee_test.go: add seedLegacyAccountInCtx and seed the legacy account in the accept / invalid-proof / CheckTx cases so the state gate passes and the proof-rejection path is what is actually asserted. - tests/integration/evm/mempool/evmigration_zero_signer_test.go: seed the legacy account into the node genesis before broadcasting (real-node path). Docs: - docs/.../tests/unit-evmigration.md: remap 27 stale signature/multisig test names to the current functions (TestVerifyCosmosSecp256k1_* / TestVerifyEthSecp256k1_* for signature verification, TestVerifyMigrationProof_ NewSide_Multisig_* for the multisig verifier, NonSecp256k1SubKey, MigrationProof_ValidateBasic_Dispatch). Every referenced name now resolves to a real func. - docs/.../tests.md: correct counters to actual values — EVMigration keeper 118+ -> 124+, EVMigration integration "15+ core" -> "14 core + 4 mempool broadcast regressions" (18 rows), comparison row 117+/19 -> 150+/18, totals Unit ~401/Int ~151/Total ~564 -> ~407/~150/~569, headline ~560 -> ~570. - CHANGELOG.md, bugs.md, integration-{evmigration,mempool}.md: describe the state-plausibility checks and the new negative tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Andrey Kobrin <andrey.kobrin@gmail.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
scripts: batch driver for EVM migration (
migrate-batch.sh)Adds
scripts/migrate-batch.sh— a thin lifecycle manager around the existingmigrate-multisig.sh/migrate-account.shscripts, for operators who need tomigrate many legacy accounts in a single run (think foundation pools post-upgrade).
This is a draft PR for early review and discussion. It is not intended for
mainnet use yet — initial scope is local devnet + testnet validation.
Why
Manually migrating ~30 multisig foundation accounts is ~20 steps each
(legacy + new key imports × N signers, reconstruct multisigs, fund, self-send
to publish pubkey, then generate / sign × K / combine / submit). Per the recent
team call, doing that 30× by hand is error-prone enough that we want a driver.
This PR is that driver.
What it does
Per target:
evmigration migration-record,auth account,bank balances) → one of:migrated,ready,needs-pubkey,needs-funding,unknown.trap). Operator mnemonics never enter the operator's main keyring.
secp256k1) and new (coin-type 60, eth_secp256k1) variants, via the existing
import_from_mnemonichelper.signer order from the file's
public_keys[]array (matched by pubkeyequality — never by name suffix). Assert the reconstructed address
equals the address in the mnemonics file; abort the target otherwise.
--funderif balance is zero (operator's main keyring;never imported into the ephemeral one).
chain if missing.
migrate-multisig.sh generate → sign × K → combine → submit(ormigrate-account.shfor standalone single-sig).evmigration migration-recordand tear down the ephemeralkeyring.
Subcommands
reportstatusexecuteexecutesupports--target <name>(run one at a time),--dry-run(stopbefore any broadcast),
--continue-on-error, and--yes.Safety properties
consumed by
import_from_mnemonic, deleted immediately.trap _mb_cleanup_ephemeral EXIT(cleanup is path-prefix-checked before
rm -rf— refuses anything not under*/migrate-batch-keyring-*).any tx is broadcast. Catches signer-order or threshold bugs early.
--funderlives in the operator's main keyring (separate--funder-*flags). It is never imported into the ephemeral keyring.
scripts/evmigration-common.sh(lumerad_q,auth_pubkey_type,wait_for_tx,assert_broadcast_accepted, etc.). Zero duplication ofmigration logic.
already-migrated targets and re-classifies the rest from chain state.
What this PR does NOT do
iterate on the surface → mainnet conversation as a separate PR with an
operator-runbook update. Not for mainnet use as it stands.
holds all signer mnemonics. For the K-of-N co-signer ceremony across hosts,
use
migrate-multisig.sh signper signer directly.Test plan
bash -n+shellcheck -x -e SC1091,SC2034cleanreportmatches manual classification against a real mnemonics file(28 multisigs, 3 standalones, 31 targets; non-sequential signer orders
correctly detected)
statusagainst a fresh devnet with onepre-funded multisig in each state (
migrated,ready,needs-pubkey,needs-funding)execute --target <one> --dry-run(addressreconstruction + assertion path) for one multisig and one single-sig
execute --target <one>end-to-end for onemultisig and one single-sig; verify on-chain
migration-recordmigration PRs land and the testnet release binary is cut)
Files
scripts/migrate-batch.sh(947 lines)scripts/migrate-batch.md(operator README)Open questions for reviewers
migrate-batch.shis generic — it handles both multisig andsingle-sig targets. Open to
migrate-multisig-batch.shif the multi-onlyframing is preferred, but it would be misleading given the standalone
single-sig path.
--top-up-amount. Currently100000ulume— covers self-sendvalue (
100000ulume) + fees (5000ulume) with headroom. Right magnitude?--fundersemantics. Currentlybank sendfrom the funder, wait forinclusion, then proceed. Should we require an explicit per-target
confirmation (
--ask) on top of the batch-level confirm?