feat(metrics): event-bus by-type breakdown + webhook delivery counters#70
Merged
Conversation
Two new metric families operators need to actually see what the bus +
webhook are doing — the existing pilot_event_bus_publish_total /
pilot_webhook_configured only tell you "something happened" / "URL is
set" without surfacing what's flowing or whether deliveries are
succeeding.
## Event bus
`pilot_event_bus_publish_by_type_total{event="source.type"}`
Each Publish() now invokes a second hook OnPublishEvent(Event), wired
from server_lifecycle.go to increment a CounterVec labeled by the
publisher's Source + "." + Type. So operators see membership.changed
vs trust.created vs server.audit.entry side by side instead of one
aggregate number.
The bus interface gains SetOnPublishEvent alongside the existing
SetOnPublish; old callers stay working.
## Webhook
`pilot_webhook_deliveries_total{result="ok|error|dropped"}`
`pilot_webhook_last_delivery_unix_seconds`
Webhook dispatcher already maintained atomic delivered/failed/dropped
counters. Added accessor Store.Stats() + a lastAttemptUnix stamp set
in post() so /metrics has a freshness signal. The metrics layer polls
via WebhookStatsFn at scrape time (same pattern as BeaconStatsFn) and
omits the block when no URL is configured (ok=false) so a registry
without a webhook doesn't lie about zero deliveries.
## Tests
Two new in metrics/zz_metrics_test.go:
- by-type renders one labeled line per source.type pair
- webhook stats fn gating: ok=false → omit; ok=true → all three result
labels + the gauge render
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Operators looking at the N-events and N-webhook panels on the dashboard had nothing actionable — the existing counters told them "something happened" / "URL is configured" without surfacing what was actually flowing.
Event bus
Adds
pilot_event_bus_publish_by_type_total{event="source.type"}so the publish total breaks down per publisher.event-name (membership.changed, trust.created, server.audit.entry, ...). The bus interface gainsSetOnPublishEvent(func(Event))alongside the existing no-argSetOnPublish; old callers stay working.Webhook
Adds
pilot_webhook_deliveries_total{result="ok|error|dropped"}+pilot_webhook_last_delivery_unix_seconds. Webhook dispatcher already had atomic counters internally; surfaces them via Store.Stats() + a new lastAttemptUnix stamp updated in post(). Polled at scrape time via WebhookStatsFn (BeaconStatsFn pattern); ok=false → block omitted when no URL is configured.Tests
go build ./...clean.go test ./events ./webhook ./metricsgreen.