[SLOP(claude-opus-4-8)] feat(pegboard-envoy): rate-limit envoy ws ingress and cap get_pages, trim unused metrics#5327
Conversation
…ress and cap get_pages, trim unused metrics
PR ReviewThis PR adds a leaky-bucket rate limiter to the pegboard-envoy WebSocket ingress path (mirroring the existing gateway rate limiter), caps the number of pages a single get_pages request may ask for (8192), and removes the prepopulate() metric-initialization functions that were pre-seeding zero-valued time series with empty labels. Findingsengine/packages/pegboard-envoy/src/ws_to_tunnel_task.rs:486 — Abort and term signals cannot interrupt the envoy rate-limit sleep ws_to_tunnel_abort_rx and term_signal are handled inside recv_msg, which only runs after acquire() returns. While the leaky bucket is sleeping, those signals cannot interrupt. The gateway version has abort as an outer select! branch and cancels acquire() immediately when abort fires. With the default drip_rate_us = 200 (0.2 ms) this is negligible in practice, but the inconsistency means an operator setting a higher drip rate (e.g. 50 ms to throttle an overloaded runner more aggressively) would see a corresponding delay in graceful-shutdown responsiveness for every envoy connection. The fix is to add the abort as an outer branch in the task_inner loop, the same way it is in pegboard-gateway2. engine/packages/pegboard-envoy/src/ws_to_tunnel_task.rs:1436 — Oversized get_pages returns a soft error rather than terminating the connection CLAUDE.md states that the envoy/pegboard-envoy channel is untrusted. Returning SqliteErrorResponse and leaving the connection open means a misbehaving runner can keep sending oversized requests within the rate-limit burst. Because this is a validation failure at a trust boundary, closing the connection would be more principled. That said, CLAUDE.md also explicitly says to return SqliteErrorResponse for validation failures rather than bubble them through the shared connection task, so this follows the stated guidance. The rate limiter already bounds total message volume; the cap adds defense-in-depth regardless. engine/packages/pegboard-gateway2/src/ws_to_tunnel_task.rs:29 — Minor: ctx.config().pegboard() now called twice The removed pegboard_config local binding made it obvious the same config snapshot was used for both rate-limit fields. The new code calls ctx.config().pegboard() on two separate lines. Not a bug, but a small readability regression. What looks good
Generated with Claude Code |
No description provided.