fix: DEVOPS-364 governance proposal creation 504 + stuck publish spinner#777
Merged
Conversation
getLiquidity downloaded the entire gZIL ZRC2 balances map (~5MB) on every proposal just to read one address, taking ~30s from the cluster and exceeding the gateway's 30s backend_timeout, returning 504. - Fetch only the submitter's entry (index [ownerKey]) instead of the full map. - Normalise the key to lowercase 0x (Scilla state key format; also fixes a latent checksum mismatch for bech32/ZilPay addresses). - Seed the submitter's key so ZilSwap/XCAD LP balances are still credited with no direct balance. - Guard null RPC results; default missing balances to '0' (fail-closed gate). - Add framework-free regression tests.
client.request left its promise pending when an error response was not JSON (a 504 gateway HTML page) or the request was CORS-blocked, leaving the publish spinner stuck forever.
- Rewrite with async/await; always resolve or reject.
- AbortController with a 45s timeout.
- Parse error bodies defensively; reject with {code,error_description} or a fallback.
- Add framework-free regression tests.
…timeout PR #777 review (C1): scoping the balances fetch to the submitter also shrank the IPFS-pinned snapshot that the frontend uses as the whole-electorate voter-scoring oracle (get-scores.ts reads proposal.balances[voter] before any live fallback), so proposals created after deploy would have counted only the submitter's vote. - custom-fetch.ts: keep fetching the FULL holder map (index []) for the pinned snapshot; retain the lowercase-key gate lookup, null guards and LP seeding. - backendpolicy.yaml (staging + production): GCPBackendPolicy timeoutSec=90 so the ~30s full-map fetch is not killed by GKE's 30s default (the actual 504 fix). - client.ts: raise request timeout to 95s (above the gateway) so a real 504 surfaces instead of the client aborting first. - test: fullMapFetchTest now asserts the FULL map is fetched + electorate retained (guards C1); wire 'npm test' in both packages (M1).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Users could not create proposals on the governance portal. Two distinct bugs, found by reproducing on staging and tracing the GCP logs:
Backend 504.
POST /api/messagecallsgetLiquidity(), which downloads the entire gZIL ZRC2balancesmap (~5 MB) fromapi.zilliqa.comon every proposal. From the GKE cluster this takes ~30 s, exceeding the gateway's 30 sbackend_timeout-> 504. Verified on staging:Frontend infinite spinner.
client.requestnever settled its promise when the error body was not JSON (a 504 gateway HTML page) or the request was CORS-blocked (e.json().then(json => reject(json))left the outer promise pending). The publish spinner spun forever with no error shown.Changes
governance-api-lib/zilliqa/custom-fetch.ts,lib/routes/message.ts,cd/overlays/{staging,production}/backendpolicy.yamlbalancesmap is fetched on purpose (GetSmartContractSubState(..., "balances", [])): it is pinned to IPFS as the whole-electorate voter-scoring snapshot thatgovernance-snapshotreads viaget-scores.ts(proposal.balances[voter]).GCPBackendPolicy.spec.default.timeoutSecis raised to 90 s (staging + production) so the ~30 s fetch is no longer killed by GKE's 30 s default.0x...(Scilla state key format) for the MIN_BALANCE gate; also fixes a latent checksum mismatch for bech32/ZilPay addresses that previously made the gate readundefined."0"(fail-closed gate).governance-snapshot-src/helpers/client.tsrequest()to always settle:AbortController(95 s timeout, just above the gateway's 90 s so a real 504 surfaces instead of the client aborting first), defensive JSON parsing, structured rejects ({code, error_description}or fallback). A 504/non-JSON now shows an error toast instead of hanging.Testing
Framework-free
node:assertregression tests (the packages have no test runner), wired asnpm testin both packages:custom-fetch.test.ts:fullMapFetchTestasserts the balances RPC fetches the full map (params[2] == []) and that the electorate snapshot is retained (a second holder survives), guarding C1; the submitter's balance is read back via the normalized lowercase key; LP-only holder credited via the seeded key; null result ->"0"without throwing.client.test.ts: rejects (no hang) on non-JSON 504; resolves on success; rejects with timeout on abort; preserves server JSON error; success-empty-body resolves.Both suites pass (
npm testin each package); both packages typecheck clean.Verify on staging
201); proposal creation completes within the 90 s backend timeout (no 504).400MIN_BALANCE with a visible toast (no infinite spinner).Follow-ups (intentionally out of scope)
proposal()/vote()are called withoutawaitand their early-return responses are discarded (message.ts:229); malformed proposals still hit the slow path and can double-send the response.0xaddresses hold no gZIL on Zilliqa, so MetaMask proposals on the gZIL space always hit MIN_BALANCE - decide on address mapping vs. restricting to ZilPay.levelis not mapped to Cloud Loggingseverity; 404s are logged at error level.client.ts95 s ceiling is also global; a per-call timeout would keep fast-fail for metadata GETs.