Skip to content

docs(flaky-tests): add "Alert When a Test Escalates" webhook recipe#249

Merged
samgutentag merged 8 commits into
mainfrom
sam-gutentag/flaky-tests-escalation-alerts
Jun 15, 2026
Merged

docs(flaky-tests): add "Alert When a Test Escalates" webhook recipe#249
samgutentag merged 8 commits into
mainfrom
sam-gutentag/flaky-tests-escalation-alerts

Conversation

@samgutentag

@samgutentag samgutentag commented Jun 12, 2026

Copy link
Copy Markdown
Member

What

Seeds a new Flaky Tests → Recipes nav group with its first entry: Alert When a Test Escalates, documenting how to send Slack alerts when a test gets worse — not just the first time it's flagged.

This is a process/pattern doc, not a connector reference, so it reads as a recipe rather than living under Webhooks (which keeps a cross-link card to it). Planned next entries for the group: the in-flight quarantine-recipes (#59) and monitor-tuning (#53) pages.

Driven by the connectors Slack thread (Tyler Jang's Reply 9): document the P2 user story for Slack notifications when a test starts flaking beyond the first detection level, with the classify-as-broken vs apply-a-label fork.

The core distinction (verified against trunk-io/trunk2)

  • v2.test_case.status_changed fires only on overall health transitions (HEALTHY/FLAKY/BROKEN). A second monitor piling onto an already-FLAKY test changes nothing, so no event is sent.
  • test_case.monitor_status_changed fires per individual monitor activation/resolution — the "more than just the first detection" coverage. This is why monitor_status_changed was added to the Slack connector in the thread.

Verified the schemas and fan-out behavior in ts/packages/flake-detection/src/types.ts, ts/apps/flake-detection-side-effects-handler/src/webhook.ts, ts/apps/detection-engine-webhook-event-handler/src/enrich.ts, and ts/packages/tools/svix-publish-schemas/src/events/.

Changes

  • New page flaky-tests/recipes/alert-on-test-escalation.mdx — event-picker table, classify-as-broken transform snippet (gate on new_status), apply-a-label transform snippet (route on monitor.type from monitor_status_changed).
  • New Recipes nav group in docs.json, after Webhooks.
  • webhooks/slack-integration.mdx — "Alert only when a test gets worse" cross-link section.
  • webhooks/index.mdx — card for the new recipe.

All examples stay on the v2 event schema. The legacy test_case.status_changed (v1) schema is intentionally not documented.

Scope note

The product-side half of the thread — connectors (GitHub Issues, Linear, Jira, Slack) accepting both v1 and v2 events and the updated suggested transforms — was handled by Tyler Beebe. This PR is the docs-side user story only.

Engineering authors

For technical-accuracy review:

  • @TylerJang27 — requested the user story; owns the event-type direction
  • @acatxnamedvirtue (Tyler Beebe) — updated the connectors + suggested transforms in the thread

Notes

  • The two ../webhooks/index links in the event table are anchor-less: Mintlify's slug for a backtick-and-dot heading like ### v2.test_case.status_changed is unpredictable. If we want deep links, confirm the real anchors against a Mintlify build.
  • Pre-existing (not in this diff): webhooks/index.mdx documents monitor.status as "active or resolved", but the source enum is active/inactive (types.ts:173). Worth a separate fix.

Documents the test-escalation user story from the connectors Slack
thread: how to get Slack alerts when a test gets worse, not just on
first detection.

- New recipe page covering the v2.test_case.status_changed (overall
  health transitions) vs test_case.monitor_status_changed (per-monitor
  activations) distinction, with transform snippets for the
  classify-as-broken and apply-a-label forks.
- Cross-link section in the Slack integration guide.
- Card on the webhooks index + nav entry in docs.json.

All examples stay on the v2 event schema; legacy v1 event not documented.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@samgutentag

Copy link
Copy Markdown
Member Author

Verification status (2026-06-12): live

Verified: customers can use this. Ready to publish.

  • Flag state: none. This is a documentation recipe over existing webhook events; no feature flag gates them. (LaunchDarkly not consulted, no flag to read.)
  • Eng PR: none referenced. The page documents already-shipped webhook events, not new eng work.
  • Flag: none
  • Signals:
    • v2.test_case.status_changed with the BROKEN status value is GA, shipped in the published changelog dated 2026-03-10 (Detect Consistently Failing Tests as Broken).
    • Webhooks for test status changes GA since the published changelog dated 2024-12-16.
    • test_case.monitor_status_changed is in the live webhook event catalog (flaky-tests/webhooks/index.mdx).
    • The connectors Slack thread (2026-06-11) shows these events actively configured on production connectors.
    • Event schemas and fan-out behavior confirmed against trunk-io/trunk2 source (types.ts, webhook.ts, enrich.ts, svix-publish-schemas).

No rollout to wait on. Engineering authors (@TylerJang27, @acatxnamedvirtue) are requested as reviewers for technical-accuracy sign-off before publish.

@samgutentag samgutentag added the ready to merge Verify docs PR: customers can use this. Ready to publish. label Jun 12, 2026
@samgutentag

samgutentag commented Jun 12, 2026

Copy link
Copy Markdown
Member Author

Code verification (2026-06-15): 8 confirmed / 0 contradicted / 0 ambiguous / 0 unverifiable

All factual claims in the recipe verified against trunk-io/trunk2. The transform snippets were also executed against the real event payloads in a local Node harness (16/16: send/cancel behavior correct; casing experiment confirms the handlers require uppercase status values, which the payload provides).

Claim Verdict Source
Event v2.test_case.status_changed fires on overall health transitions confirmed v2-test-case-status-changed.ts:6-9
Event test_case.monitor_status_changed fires per monitor activate/resolve confirmed test-case-monitor-status-changed.ts:7
Health statuses are HEALTHY / FLAKY / BROKEN (uppercase) confirmed types.ts:8
status_changed carries previous_status / new_status confirmed types.ts:285-288
monitor_status_changed carries monitor.status / monitor.type (filter on "active") confirmed types.ts:173
test_case.name / test_case.html_url available on both events confirmed enriched types.ts:16-23
BROKEN reached only via a Broken-type failure rate / failure count monitor confirmed types.ts:151
A broken classification un-quarantines an auto-quarantined test (auto-quarantine is FLAKY-only) confirmed broken-state-product-spec.md:98-100

No contradictions. The new <Warning> about the broken-vs-label quarantine side effect is accurate.


Source #1 — status_changed fires on overall health transitions (confirmed)

File: trunk-io/trunk2/.../events/v2-test-case-status-changed.ts#L6-L9

description:
  "Emitted when the health status of a test case changes. Test status can transition between HEALTHY, FLAKY, and BROKEN. ...",

Reasoning: Fires on health status changes. fanOutTestCaseStatusChanges (webhook.ts#L49-L53) emits one event per reconcile status change, so a second monitor that does not move the combined status produces no event. Basis for the "first detection vs escalation" distinction.

Source #2 — status enum and monitor.status value, uppercase (confirmed)

File: trunk-io/trunk2/ts/packages/flake-detection/src/types.ts#L8 and #L173

export const TestCaseStatusSchema = z.enum(["HEALTHY", "FLAKY", "BROKEN"]);
...
export const MonitorStatusSchema = z.enum(["active", "inactive"]);

Reasoning: Status values are uppercase. The Node harness confirmed the practical trap: lowercasing the comparisons in the SEVERITY map makes every lookup undefined, undefined <= undefined is false, so the transform stops cancelling and alerts on recoveries too. The snippets are correct because they match the payload's uppercase values. MonitorClassificationSchema = z.enum(["FLAKY", "BROKEN"]) (line 151) confirms BROKEN is reachable only via a classifying monitor.

Source #3 — broken classification un-quarantines an auto-quarantined test (confirmed)

File: trunk-io/trunk2/docs/prd/broken-state-product-spec.md#L98-L100

`BROKEN` tests are **not quarantine candidates**. Quarantining is intended for flaky tests that can be safely skipped...

The existing quarantine logic checks for `FLAKY` status only and is not affected by the addition of `BROKEN`.

Reasoning: The eng spec states the existing quarantine logic gates on FLAKY status only, so a test reclassified to BROKEN (which wins over FLAKY by status priority) drops out of the auto-quarantine set and its failures block CI again. The eligibility check is server-side (the analytics CLI just fetches the bulk quarantine result), and this spec documents that existing behavior; the published Quarantining docs corroborate. The <Warning> correctly scopes this to auto-quarantine; manually quarantined tests are quarantined regardless of status.

@samgutentag samgutentag added the code-verified verify-docs-against-code: all factual claims confirmed in source. label Jun 12, 2026
@mintlify

mintlify Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
trunk 🟢 Ready View Preview Jun 12, 2026, 5:48 PM

Seed a Flaky Tests > Recipes nav group with the escalation-alert page as
its first entry. It's a process/pattern doc, not a connector reference,
so it reads better as a recipe than under Webhooks. Webhooks keeps a
cross-link card. Quarantine-recipes (#59) and monitor-tuning (#53) are
the planned next entries.

- git mv flaky-tests/webhooks/ -> flaky-tests/recipes/
- new Recipes group in docs.json after Webhooks
- fixed relative links (./slack-integration, ./index -> ../webhooks/...)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copy link
Copy Markdown
Member Author

Verification status (June 13, 2026): live

Verified: customers can use this. Ready to publish (currently draft — author's choice).

  • Flag state: LaunchDarkly not consulted (no feature flag identified; documents existing product behavior)
  • Eng PR links: none (documents existing behavior)
  • Flag: none
  • Signals checked: prior sweep chain (consistent live verdict); no regression signals June 13, 2026
  • Suggested next action: ready to merge when author marks ready for review

Verified by Daily Docs Sweep · June 13, 2026


Generated by Claude Code

@TylerJang27

TylerJang27 commented Jun 13, 2026

Copy link
Copy Markdown

This is awesome! Two notes:

  • Can you review the transformation code in the suggestion just to check its validity closely?
  • Can you include in the warnings and discussions about whether you should do broken or label -> broken changes a classification and causes an auto-quarantined test to no longer be quarantined. The link you have to the docs page on composite state is good otherwise

…ky tests

Addresses Tyler's PR #249 review:
- Clarify the transform snippets are drop-in replacements for the Slack
  guide's handler and depend on its summarizeTestCase helper staying in
  the transformation.
- Add a Warning that classifying a test as broken changes its health
  status, dropping a flaky+auto-quarantined test out of auto-quarantine
  (broken tests aren't quarantine candidates) so it blocks CI again.
  Labeling monitors avoid this; manually quarantined tests are unaffected.
- Tie the label Tip to the quarantine tradeoff.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Resolves the remaining part of Tyler's PR #249 review (transform validity):
- Inline comment in both status snippets noting summarizeTestCase() lives
  in the Slack integration guide, so a single-block copy-paste doesn't
  silently ReferenceError.
- Comment on the SEVERITY map noting status values are uppercase.

Validated with a local Node harness against the real v2 + monitor payloads
(16/16): handlers send/cancel correctly, and the casing experiment confirms
lowercasing the comparisons silently breaks gating.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Voice/clarity pass on the escalation recipe, remove all em dashes.
- Add two standalone animated SVGs (CSS keyframes, reduced-motion safe):
  - event-granularity-gap: HEALTHY->FLAKY->FLAKY across three columns,
    showing status_changed stays silent on the second monitor while
    monitor_status_changed fires on both.
  - broken-classification-quarantine: a broken classification drops a
    flaky auto-quarantined test out of quarantine and re-blocks CI.
- Embed both via <Frame> in the recipe.

Transforms validated end to end: trunk2 source, a Node harness (16/16),
and Svix Run Test on a play.svix.com test endpoint.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Apply CONTRIBUTING admonition rules to the escalation recipe:
- The quarantine side effect was a two-paragraph <Warning> wrapping what
  is really core content. The guide forbids wrapping a section in a
  callout, and a reversible, by-design behavior is not a Warning-grade
  hazard. Promote it to a '## The quarantine trade-off' section (prose),
  and move the broken-classification animation into it.
- Trim the label <Tip> so it no longer duplicates that section; it now
  covers only the optional label-routing mechanics.

Page now has two callouts (Info for background, Tip for an optional path),
none stacked or section-wrapping.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…n gap diagram

Accuracy pass on the event-granularity diagram. The column-3 event is
correct (monitor_status_changed fires on Monitor B's own activation,
independent of overall status), but the framing invited a 'why an event
if FLAKY to FLAKY?' misread. Sharpen it:
- column 3 sublabel 'already FLAKY' -> '2nd monitor, still FLAKY'
- caption: 'catches both escalations' -> 'fires on every monitor
  activation, so it catches both' (Monitor A's first detection is not an
  escalation)

broken-classification diagram audited, accurate, unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@samgutentag samgutentag marked this pull request as ready for review June 15, 2026 18:17
@samgutentag samgutentag merged commit 6f29bf6 into main Jun 15, 2026
3 checks passed
@samgutentag samgutentag deleted the sam-gutentag/flaky-tests-escalation-alerts branch June 15, 2026 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

code-verified verify-docs-against-code: all factual claims confirmed in source. ready to merge Verify docs PR: customers can use this. Ready to publish.

Development

Successfully merging this pull request may close these issues.

2 participants