From b6d878730ae47c8b298c2377db34a3c8e203d01e Mon Sep 17 00:00:00 2001 From: Sam Gutentag <1404219+samgutentag@users.noreply.github.com> Date: Fri, 12 Jun 2026 10:26:21 -0700 Subject: [PATCH 1/8] docs(flaky-tests): add "Alert When a Test Escalates" webhook recipe Documents the test-escalation user story from the connectors Slack thread: how to get Slack alerts when a test gets worse, not just on first detection. - New recipe page covering the v2.test_case.status_changed (overall health transitions) vs test_case.monitor_status_changed (per-monitor activations) distinction, with transform snippets for the classify-as-broken and apply-a-label forks. - Cross-link section in the Slack integration guide. - Card on the webhooks index + nav entry in docs.json. All examples stay on the v2 event schema; legacy v1 event not documented. Co-Authored-By: Claude Opus 4.8 (1M context) --- docs.json | 1 + .../webhooks/alert-on-test-escalation.mdx | 117 ++++++++++++++++++ flaky-tests/webhooks/index.mdx | 5 + flaky-tests/webhooks/slack-integration.mdx | 8 ++ 4 files changed, 131 insertions(+) create mode 100644 flaky-tests/webhooks/alert-on-test-escalation.mdx diff --git a/docs.json b/docs.json index e16adc7d..5b365bd1 100644 --- a/docs.json +++ b/docs.json @@ -295,6 +295,7 @@ "group": "Webhooks", "root": "flaky-tests/webhooks/index", "pages": [ + "flaky-tests/webhooks/alert-on-test-escalation", "flaky-tests/webhooks/slack-integration", "flaky-tests/webhooks/microsoft-teams-integration", "flaky-tests/webhooks/github-issues-integration", diff --git a/flaky-tests/webhooks/alert-on-test-escalation.mdx b/flaky-tests/webhooks/alert-on-test-escalation.mdx new file mode 100644 index 00000000..52a15ab5 --- /dev/null +++ b/flaky-tests/webhooks/alert-on-test-escalation.mdx @@ -0,0 +1,117 @@ +--- +title: "Alert When a Test Escalates" +description: "Send Slack alerts when a test gets worse, not just the first time it's flagged" +og:title: "Alerting on flaky test escalation with Trunk webhooks" +--- +A single "this test is now flaky" alert tells you a test crossed a threshold once. It doesn't tell you when that same test keeps getting worse — failing on more branches, tripping additional monitors, or degrading from flaky to a consistently broken regression. For tests that matter, you want to hear about the escalation, not just the first detection. + +This page shows how to wire that up with Trunk webhooks and a Slack transformation. It builds on the [Slack integration guide](./slack-integration) — set that connection up first, then come back here to filter it down to escalations. + +## Pick the right event + +The key decision is which event you subscribe to, because two different events fire at two different granularities. + +| Event | Fires when | Use it to | +|---|---|---| +| [`v2.test_case.status_changed`](./index) | The test's **overall health status** transitions between `HEALTHY`, `FLAKY`, and `BROKEN` | Alert on health escalations like `FLAKY` → `BROKEN` | +| [`test_case.monitor_status_changed`](./index) | **Any individual monitor** activates or resolves for the test | Alert every time a monitor flags the test, even if its overall status doesn't move | + +The distinction matters. `v2.test_case.status_changed` only fires when the test's combined status changes. If a test is already `FLAKY` and a second monitor starts flagging it, the overall status stays `FLAKY`, so no `v2.test_case.status_changed` event is sent. To catch a test getting flagged by more monitors over time — the "more than just the first detection" case — subscribe to `test_case.monitor_status_changed` instead. + + +Test status priority is **Broken > Flaky > Healthy**. A test flagged by both a broken-type and a flaky-type monitor shows as `BROKEN` until the broken monitor resolves. See [Flake Detection](../detection/) for how the combined status is calculated. + + +## Alert when a test becomes broken + +Use this when you want a louder, separate signal for tests that have degraded into consistent failures, distinct from routine flakiness. + +**1. Configure a broken-type monitor.** A test only reaches `BROKEN` status when a [failure rate](../detection/failure-rate-monitor) or [failure count](../detection/failure-count-monitor) monitor with its **Detection type** set to **Broken** is active for it. Set one up if you haven't already. A common pattern is to pair a broken-type monitor (catching consistently failing tests) with a flaky-type monitor (catching intermittent ones). + +**2. Filter the transformation to escalations.** In your Slack endpoint's transformation, cancel the webhook unless the status got worse. This example ranks the three statuses and only sends a message when `new_status` is more severe than `previous_status`, so recoveries and resolutions stay quiet: + +```javascript +const SEVERITY = { HEALTHY: 0, FLAKY: 1, BROKEN: 2 }; + +function handler(webhook) { + const { previous_status = "HEALTHY", new_status = "HEALTHY" } = webhook.payload; + + // Only alert when the test got worse, not when it recovered. + if (SEVERITY[new_status] <= SEVERITY[previous_status]) { + webhook.cancel = true; + return webhook; + } + + webhook.payload = summarizeTestCase(webhook.payload); + return webhook; +} +``` + +To alert *only* when a test reaches the broken state — and stay silent on first-time flaky detections — gate on the new status directly instead: + +```javascript +function handler(webhook) { + if (webhook.payload.new_status !== "BROKEN") { + webhook.cancel = true; + return webhook; + } + + webhook.payload = summarizeTestCase(webhook.payload); + return webhook; +} +``` + +Reuse the `summarizeTestCase` helper from the [Slack integration guide](./slack-integration#id-2.-customize-your-transformation) to format the message body. The `previous_status → new_status` line in that template makes the escalation obvious in the channel. + +## Alert every time a monitor flags a test + +Use this when you want to know about every detection event on a test, including the ones that don't change its overall status — a second monitor piling on, or a labeling monitor surfacing a new pattern. + +**1. Subscribe to `test_case.monitor_status_changed`.** On your Slack endpoint, enable this event in addition to (or instead of) `v2.test_case.status_changed`. + +**2. Filter to monitor activations.** The event fires on both activation and resolution, so cancel the webhook unless a monitor is becoming active: + +```javascript +function handler(webhook) { + const { monitor } = webhook.payload; + + // Only alert when a monitor starts flagging the test. + if (!monitor || monitor.status !== "active") { + webhook.cancel = true; + return webhook; + } + + webhook.payload = { + blocks: [ + { + type: "header", + text: { type: "plain_text", text: `Monitor active: ${webhook.payload.test_case.name}` }, + }, + { + type: "section", + text: { + type: "mrkdwn", + text: [ + `Monitor type: \`${monitor.type}\``, + `Test Details: ${webhook.payload.test_case.html_url}`, + ].join("\n"), + }, + }, + ], + }; + return webhook; +} +``` + +Because `test_case.monitor_status_changed` fires for every monitor independently, this catches a test that keeps tripping new monitors over time, even while its headline status stays `FLAKY`. The `monitor.type` field tells you which monitor fired, so you can branch on it — for example, route [labeling monitors](../management/test-labels#automatic-labeling-from-monitors) to a triage channel and health classification monitors to your on-call channel. + + +Prefer labels over a separate broken classification when you want to triage a pattern without changing a test's health status. Configure a monitor's action as **Apply labels**, then filter `test_case.monitor_status_changed` on `monitor.type` to route those activations wherever they belong. See [Test Labels](../management/test-labels) for the full setup. + + +## Related + +- [Integration for Slack](./slack-integration) — set up the Slack connection these transformations build on +- [Webhooks](./index) — the full event catalog and field reference +- [Flake Detection](../detection/) — how monitors classify tests as flaky or broken +- [Test Labels](../management/test-labels) — apply and route labels with monitors diff --git a/flaky-tests/webhooks/index.mdx b/flaky-tests/webhooks/index.mdx index 0a64f74e..efe6f76b 100644 --- a/flaky-tests/webhooks/index.mdx +++ b/flaky-tests/webhooks/index.mdx @@ -88,6 +88,11 @@ Emitted when an AI-powered flaky test analysis finishes for a test case. You can also find guides for specific examples here: + + +## Alert only when a test gets worse + +By default this connection alerts on every status change. If you'd rather hear about a test only when it **escalates** — degrading to broken, or tripping more monitors over time — filter the transformation on the status transition instead of sending every event. + + + Send Slack alerts when a test gets worse, not just the first time it's flagged. + + ## Congratulations! You should now receive notifications in your Slack workspace when a test's status changes. You can further modify your transformation script to customize your messages. From 666cc815f80b8dbb57260a5ad774a826a880e001 Mon Sep 17 00:00:00 2001 From: Sam Gutentag <1404219+samgutentag@users.noreply.github.com> Date: Fri, 12 Jun 2026 10:41:21 -0700 Subject: [PATCH 2/8] docs(flaky-tests): use Info callout for status-priority background Co-Authored-By: Claude Opus 4.8 (1M context) --- flaky-tests/webhooks/alert-on-test-escalation.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/flaky-tests/webhooks/alert-on-test-escalation.mdx b/flaky-tests/webhooks/alert-on-test-escalation.mdx index 52a15ab5..ba62d241 100644 --- a/flaky-tests/webhooks/alert-on-test-escalation.mdx +++ b/flaky-tests/webhooks/alert-on-test-escalation.mdx @@ -18,9 +18,9 @@ The key decision is which event you subscribe to, because two different events f The distinction matters. `v2.test_case.status_changed` only fires when the test's combined status changes. If a test is already `FLAKY` and a second monitor starts flagging it, the overall status stays `FLAKY`, so no `v2.test_case.status_changed` event is sent. To catch a test getting flagged by more monitors over time — the "more than just the first detection" case — subscribe to `test_case.monitor_status_changed` instead. - + Test status priority is **Broken > Flaky > Healthy**. A test flagged by both a broken-type and a flaky-type monitor shows as `BROKEN` until the broken monitor resolves. See [Flake Detection](../detection/) for how the combined status is calculated. - + ## Alert when a test becomes broken From dab45d3c7350e7f3464efbad45106061183607de Mon Sep 17 00:00:00 2001 From: Sam Gutentag <1404219+samgutentag@users.noreply.github.com> Date: Fri, 12 Jun 2026 10:53:46 -0700 Subject: [PATCH 3/8] docs(flaky-tests): move escalation recipe into new Recipes group Seed a Flaky Tests > Recipes nav group with the escalation-alert page as its first entry. It's a process/pattern doc, not a connector reference, so it reads better as a recipe than under Webhooks. Webhooks keeps a cross-link card. Quarantine-recipes (#59) and monitor-tuning (#53) are the planned next entries. - git mv flaky-tests/webhooks/ -> flaky-tests/recipes/ - new Recipes group in docs.json after Webhooks - fixed relative links (./slack-integration, ./index -> ../webhooks/...) Co-Authored-By: Claude Opus 4.8 (1M context) --- docs.json | 7 ++++++- .../alert-on-test-escalation.mdx | 12 ++++++------ flaky-tests/webhooks/index.mdx | 2 +- flaky-tests/webhooks/slack-integration.mdx | 2 +- 4 files changed, 14 insertions(+), 9 deletions(-) rename flaky-tests/{webhooks => recipes}/alert-on-test-escalation.mdx (85%) diff --git a/docs.json b/docs.json index 5b365bd1..6fe27a28 100644 --- a/docs.json +++ b/docs.json @@ -295,7 +295,6 @@ "group": "Webhooks", "root": "flaky-tests/webhooks/index", "pages": [ - "flaky-tests/webhooks/alert-on-test-escalation", "flaky-tests/webhooks/slack-integration", "flaky-tests/webhooks/microsoft-teams-integration", "flaky-tests/webhooks/github-issues-integration", @@ -303,6 +302,12 @@ "flaky-tests/webhooks/jira-integration" ] }, + { + "group": "Recipes", + "pages": [ + "flaky-tests/recipes/alert-on-test-escalation" + ] + }, { "group": "Agents", "root": "flaky-tests/agents/index", diff --git a/flaky-tests/webhooks/alert-on-test-escalation.mdx b/flaky-tests/recipes/alert-on-test-escalation.mdx similarity index 85% rename from flaky-tests/webhooks/alert-on-test-escalation.mdx rename to flaky-tests/recipes/alert-on-test-escalation.mdx index ba62d241..812d8b1a 100644 --- a/flaky-tests/webhooks/alert-on-test-escalation.mdx +++ b/flaky-tests/recipes/alert-on-test-escalation.mdx @@ -5,7 +5,7 @@ og:title: "Alerting on flaky test escalation with Trunk webhooks" --- A single "this test is now flaky" alert tells you a test crossed a threshold once. It doesn't tell you when that same test keeps getting worse — failing on more branches, tripping additional monitors, or degrading from flaky to a consistently broken regression. For tests that matter, you want to hear about the escalation, not just the first detection. -This page shows how to wire that up with Trunk webhooks and a Slack transformation. It builds on the [Slack integration guide](./slack-integration) — set that connection up first, then come back here to filter it down to escalations. +This page shows how to wire that up with Trunk webhooks and a Slack transformation. It builds on the [Slack integration guide](../webhooks/slack-integration) — set that connection up first, then come back here to filter it down to escalations. ## Pick the right event @@ -13,8 +13,8 @@ The key decision is which event you subscribe to, because two different events f | Event | Fires when | Use it to | |---|---|---| -| [`v2.test_case.status_changed`](./index) | The test's **overall health status** transitions between `HEALTHY`, `FLAKY`, and `BROKEN` | Alert on health escalations like `FLAKY` → `BROKEN` | -| [`test_case.monitor_status_changed`](./index) | **Any individual monitor** activates or resolves for the test | Alert every time a monitor flags the test, even if its overall status doesn't move | +| [`v2.test_case.status_changed`](../webhooks/index) | The test's **overall health status** transitions between `HEALTHY`, `FLAKY`, and `BROKEN` | Alert on health escalations like `FLAKY` → `BROKEN` | +| [`test_case.monitor_status_changed`](../webhooks/index) | **Any individual monitor** activates or resolves for the test | Alert every time a monitor flags the test, even if its overall status doesn't move | The distinction matters. `v2.test_case.status_changed` only fires when the test's combined status changes. If a test is already `FLAKY` and a second monitor starts flagging it, the overall status stays `FLAKY`, so no `v2.test_case.status_changed` event is sent. To catch a test getting flagged by more monitors over time — the "more than just the first detection" case — subscribe to `test_case.monitor_status_changed` instead. @@ -61,7 +61,7 @@ function handler(webhook) { } ``` -Reuse the `summarizeTestCase` helper from the [Slack integration guide](./slack-integration#id-2.-customize-your-transformation) to format the message body. The `previous_status → new_status` line in that template makes the escalation obvious in the channel. +Reuse the `summarizeTestCase` helper from the [Slack integration guide](../webhooks/slack-integration#id-2.-customize-your-transformation) to format the message body. The `previous_status → new_status` line in that template makes the escalation obvious in the channel. ## Alert every time a monitor flags a test @@ -111,7 +111,7 @@ Prefer labels over a separate broken classification when you want to triage a pa ## Related -- [Integration for Slack](./slack-integration) — set up the Slack connection these transformations build on -- [Webhooks](./index) — the full event catalog and field reference +- [Integration for Slack](../webhooks/slack-integration) — set up the Slack connection these transformations build on +- [Webhooks](../webhooks/index) — the full event catalog and field reference - [Flake Detection](../detection/) — how monitors classify tests as flaky or broken - [Test Labels](../management/test-labels) — apply and route labels with monitors diff --git a/flaky-tests/webhooks/index.mdx b/flaky-tests/webhooks/index.mdx index efe6f76b..902e71fb 100644 --- a/flaky-tests/webhooks/index.mdx +++ b/flaky-tests/webhooks/index.mdx @@ -90,7 +90,7 @@ You can also find guides for specific examples here: + Send Slack alerts when a test gets worse, not just the first time it's flagged. From 27dcdfb89eedceb311c2feedec618ea46dfd967a Mon Sep 17 00:00:00 2001 From: Sam Gutentag <1404219+samgutentag@users.noreply.github.com> Date: Mon, 15 Jun 2026 09:17:05 -0700 Subject: [PATCH 4/8] docs(flaky-tests): warn that broken classification un-quarantines flaky tests Addresses Tyler's PR #249 review: - Clarify the transform snippets are drop-in replacements for the Slack guide's handler and depend on its summarizeTestCase helper staying in the transformation. - Add a Warning that classifying a test as broken changes its health status, dropping a flaky+auto-quarantined test out of auto-quarantine (broken tests aren't quarantine candidates) so it blocks CI again. Labeling monitors avoid this; manually quarantined tests are unaffected. - Tie the label Tip to the quarantine tradeoff. Co-Authored-By: Claude Opus 4.8 (1M context) --- flaky-tests/recipes/alert-on-test-escalation.mdx | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/flaky-tests/recipes/alert-on-test-escalation.mdx b/flaky-tests/recipes/alert-on-test-escalation.mdx index 812d8b1a..055dec56 100644 --- a/flaky-tests/recipes/alert-on-test-escalation.mdx +++ b/flaky-tests/recipes/alert-on-test-escalation.mdx @@ -61,7 +61,13 @@ function handler(webhook) { } ``` -Reuse the `summarizeTestCase` helper from the [Slack integration guide](../webhooks/slack-integration#id-2.-customize-your-transformation) to format the message body. The `previous_status → new_status` line in that template makes the escalation obvious in the channel. +Both snippets replace the `handler` function from the [Slack integration guide](../webhooks/slack-integration#id-2.-customize-your-transformation); keep that guide's `summarizeTestCase` helper in the same transformation so the message body still renders. Its `previous_status → new_status` line makes the escalation obvious in the channel. + + +Classifying a test as **broken** changes its health status, and that can change quarantine behavior. Auto-quarantine applies only to tests with a **Flaky** status, so when a broken-type monitor flags a test that was auto-quarantined as flaky, the test becomes `BROKEN`, drops out of the auto-quarantine set, and its failures start blocking CI again. This is by design — a broken test is a real regression, not a flake to skip — but it means a broken classification is not a side-effect-free way to get an escalation alert. + +If you want the escalation signal *without* touching quarantine, use a **labeling** monitor instead (see [Alert every time a monitor flags a test](#alert-every-time-a-monitor-flags-a-test) below). Labels don't change health status, so an auto-quarantined test stays quarantined. Manually quarantined tests are unaffected either way. See [Quarantining](../quarantining/) and [Flake Detection](../detection/) for the full composite-status behavior. + ## Alert every time a monitor flags a test @@ -106,7 +112,7 @@ function handler(webhook) { Because `test_case.monitor_status_changed` fires for every monitor independently, this catches a test that keeps tripping new monitors over time, even while its headline status stays `FLAKY`. The `monitor.type` field tells you which monitor fired, so you can branch on it — for example, route [labeling monitors](../management/test-labels#automatic-labeling-from-monitors) to a triage channel and health classification monitors to your on-call channel. -Prefer labels over a separate broken classification when you want to triage a pattern without changing a test's health status. Configure a monitor's action as **Apply labels**, then filter `test_case.monitor_status_changed` on `monitor.type` to route those activations wherever they belong. See [Test Labels](../management/test-labels) for the full setup. +Prefer labels over a broken classification when you want to triage a pattern without changing a test's health status — and, as noted above, without disturbing auto-quarantine. Configure a monitor's action as **Apply labels**, then filter `test_case.monitor_status_changed` on `monitor.type` to route those activations wherever they belong. See [Test Labels](../management/test-labels) for the full setup. ## Related From 3c445d7d53d697aa18bdc1c85cb7e4f95c53b0ee Mon Sep 17 00:00:00 2001 From: Sam Gutentag <1404219+samgutentag@users.noreply.github.com> Date: Mon, 15 Jun 2026 09:47:10 -0700 Subject: [PATCH 5/8] docs(flaky-tests): note summarizeTestCase source + uppercase statuses Resolves the remaining part of Tyler's PR #249 review (transform validity): - Inline comment in both status snippets noting summarizeTestCase() lives in the Slack integration guide, so a single-block copy-paste doesn't silently ReferenceError. - Comment on the SEVERITY map noting status values are uppercase. Validated with a local Node harness against the real v2 + monitor payloads (16/16): handlers send/cancel correctly, and the casing experiment confirms lowercasing the comparisons silently breaks gating. Co-Authored-By: Claude Opus 4.8 (1M context) --- flaky-tests/recipes/alert-on-test-escalation.mdx | 3 +++ 1 file changed, 3 insertions(+) diff --git a/flaky-tests/recipes/alert-on-test-escalation.mdx b/flaky-tests/recipes/alert-on-test-escalation.mdx index 055dec56..354fecd0 100644 --- a/flaky-tests/recipes/alert-on-test-escalation.mdx +++ b/flaky-tests/recipes/alert-on-test-escalation.mdx @@ -31,6 +31,7 @@ Use this when you want a louder, separate signal for tests that have degraded in **2. Filter the transformation to escalations.** In your Slack endpoint's transformation, cancel the webhook unless the status got worse. This example ranks the three statuses and only sends a message when `new_status` is more severe than `previous_status`, so recoveries and resolutions stay quiet: ```javascript +// Status values are uppercase (HEALTHY, FLAKY, BROKEN), matching the payload. const SEVERITY = { HEALTHY: 0, FLAKY: 1, BROKEN: 2 }; function handler(webhook) { @@ -42,6 +43,7 @@ function handler(webhook) { return webhook; } + // summarizeTestCase() is defined in the Slack integration guide. webhook.payload = summarizeTestCase(webhook.payload); return webhook; } @@ -56,6 +58,7 @@ function handler(webhook) { return webhook; } + // summarizeTestCase() is defined in the Slack integration guide. webhook.payload = summarizeTestCase(webhook.payload); return webhook; } From b6b4b5532c9670a0eec782fb9d9f81f73c4cc589 Mon Sep 17 00:00:00 2001 From: Sam Gutentag <1404219+samgutentag@users.noreply.github.com> Date: Mon, 15 Jun 2026 10:21:35 -0700 Subject: [PATCH 6/8] docs(flaky-tests): sam-style pass + embed two animated diagrams - Voice/clarity pass on the escalation recipe, remove all em dashes. - Add two standalone animated SVGs (CSS keyframes, reduced-motion safe): - event-granularity-gap: HEALTHY->FLAKY->FLAKY across three columns, showing status_changed stays silent on the second monitor while monitor_status_changed fires on both. - broken-classification-quarantine: a broken classification drops a flaky auto-quarantined test out of quarantine and re-blocks CI. - Embed both via in the recipe. Transforms validated end to end: trunk2 source, a Node harness (16/16), and Svix Run Test on a play.svix.com test endpoint. Co-Authored-By: Claude Opus 4.8 (1M context) --- .../broken-classification-quarantine.svg | 67 ++++++++++++++ .../recipes/event-granularity-gap.svg | 91 +++++++++++++++++++ .../recipes/alert-on-test-escalation.mdx | 36 +++++--- 3 files changed, 180 insertions(+), 14 deletions(-) create mode 100644 assets/flaky-tests/recipes/broken-classification-quarantine.svg create mode 100644 assets/flaky-tests/recipes/event-granularity-gap.svg diff --git a/assets/flaky-tests/recipes/broken-classification-quarantine.svg b/assets/flaky-tests/recipes/broken-classification-quarantine.svg new file mode 100644 index 00000000..7fa0ffd9 --- /dev/null +++ b/assets/flaky-tests/recipes/broken-classification-quarantine.svg @@ -0,0 +1,67 @@ + + + + + + Classifying a test as broken can un-quarantine it + + + + + + + Flaky + auto-quarantined + + + Status: FLAKY + + Auto-quarantined + + CI passes (failure ignored) + + + broken-type monitor fires + + + + + + + Reclassified as broken + + + Status: BROKEN + + Not a quarantine candidate + + CI blocked (failure counts) + + + Broken tests are not quarantine candidates, so the test drops out of auto-quarantine + and its failures block CI again. Manually quarantined tests are unaffected. + + diff --git a/assets/flaky-tests/recipes/event-granularity-gap.svg b/assets/flaky-tests/recipes/event-granularity-gap.svg new file mode 100644 index 00000000..03d77f37 --- /dev/null +++ b/assets/flaky-tests/recipes/event-granularity-gap.svg @@ -0,0 +1,91 @@ + + + + + + Two events, two granularities + + + + + + + + Healthy + starting state + + Monitor A fires + HEALTHY → FLAKY + + + + Monitor B fires + already FLAKY + + + + + Test status + v2.test_case. + status_changed + test_case. + monitor_status_changed + + + + HEALTHY + + + FLAKY + + + FLAKY + + + · + + event sent + + no event + + + · + + event sent + + event sent + + + status_changed fires only when the overall status changes, so Monitor B sends nothing. + monitor_status_changed fires on every activation, so it catches both escalations. + + diff --git a/flaky-tests/recipes/alert-on-test-escalation.mdx b/flaky-tests/recipes/alert-on-test-escalation.mdx index 354fecd0..82441200 100644 --- a/flaky-tests/recipes/alert-on-test-escalation.mdx +++ b/flaky-tests/recipes/alert-on-test-escalation.mdx @@ -3,20 +3,24 @@ title: "Alert When a Test Escalates" description: "Send Slack alerts when a test gets worse, not just the first time it's flagged" og:title: "Alerting on flaky test escalation with Trunk webhooks" --- -A single "this test is now flaky" alert tells you a test crossed a threshold once. It doesn't tell you when that same test keeps getting worse — failing on more branches, tripping additional monitors, or degrading from flaky to a consistently broken regression. For tests that matter, you want to hear about the escalation, not just the first detection. +A single "this test is now flaky" alert tells you a test crossed a threshold once. It says nothing about what happens next: the same test failing on more branches, tripping more monitors, or sliding from flaky into a consistently broken regression. For the tests that matter, you want to hear about the escalation, not just the first detection. -This page shows how to wire that up with Trunk webhooks and a Slack transformation. It builds on the [Slack integration guide](../webhooks/slack-integration) — set that connection up first, then come back here to filter it down to escalations. +This page wires that up with Trunk webhooks and a Slack transformation. It builds on the [Slack integration guide](../webhooks/slack-integration), so set that connection up first, then come back here to filter it down to escalations. ## Pick the right event -The key decision is which event you subscribe to, because two different events fire at two different granularities. +The one decision that matters is which event you subscribe to. Two events fire here, at two different granularities. | Event | Fires when | Use it to | |---|---|---| | [`v2.test_case.status_changed`](../webhooks/index) | The test's **overall health status** transitions between `HEALTHY`, `FLAKY`, and `BROKEN` | Alert on health escalations like `FLAKY` → `BROKEN` | | [`test_case.monitor_status_changed`](../webhooks/index) | **Any individual monitor** activates or resolves for the test | Alert every time a monitor flags the test, even if its overall status doesn't move | -The distinction matters. `v2.test_case.status_changed` only fires when the test's combined status changes. If a test is already `FLAKY` and a second monitor starts flagging it, the overall status stays `FLAKY`, so no `v2.test_case.status_changed` event is sent. To catch a test getting flagged by more monitors over time — the "more than just the first detection" case — subscribe to `test_case.monitor_status_changed` instead. +That distinction matters. `v2.test_case.status_changed` only fires when the test's combined status changes. If a test is already `FLAKY` and a second monitor starts flagging it, the overall status stays `FLAKY`, so nothing is sent. To catch a test that keeps getting flagged by more monitors over time (the "more than just the first detection" case), subscribe to `test_case.monitor_status_changed` instead. + + + A test goes HEALTHY to FLAKY when Monitor A fires, so both events send. When Monitor B fires while the test is already FLAKY, v2.test_case.status_changed sends nothing while test_case.monitor_status_changed still fires. + Test status priority is **Broken > Flaky > Healthy**. A test flagged by both a broken-type and a flaky-type monitor shows as `BROKEN` until the broken monitor resolves. See [Flake Detection](../detection/) for how the combined status is calculated. @@ -24,7 +28,7 @@ Test status priority is **Broken > Flaky > Healthy**. A test flagged by both a b ## Alert when a test becomes broken -Use this when you want a louder, separate signal for tests that have degraded into consistent failures, distinct from routine flakiness. +Use this when consistently failing tests deserve a louder, separate signal than routine flakiness. **1. Configure a broken-type monitor.** A test only reaches `BROKEN` status when a [failure rate](../detection/failure-rate-monitor) or [failure count](../detection/failure-count-monitor) monitor with its **Detection type** set to **Broken** is active for it. Set one up if you haven't already. A common pattern is to pair a broken-type monitor (catching consistently failing tests) with a flaky-type monitor (catching intermittent ones). @@ -49,7 +53,7 @@ function handler(webhook) { } ``` -To alert *only* when a test reaches the broken state — and stay silent on first-time flaky detections — gate on the new status directly instead: +To alert *only* when a test reaches the broken state, and stay quiet on first-time flaky detections, gate on the new status directly instead: ```javascript function handler(webhook) { @@ -67,14 +71,18 @@ function handler(webhook) { Both snippets replace the `handler` function from the [Slack integration guide](../webhooks/slack-integration#id-2.-customize-your-transformation); keep that guide's `summarizeTestCase` helper in the same transformation so the message body still renders. Its `previous_status → new_status` line makes the escalation obvious in the channel. -Classifying a test as **broken** changes its health status, and that can change quarantine behavior. Auto-quarantine applies only to tests with a **Flaky** status, so when a broken-type monitor flags a test that was auto-quarantined as flaky, the test becomes `BROKEN`, drops out of the auto-quarantine set, and its failures start blocking CI again. This is by design — a broken test is a real regression, not a flake to skip — but it means a broken classification is not a side-effect-free way to get an escalation alert. +Classifying a test as **broken** changes its health status, and that can change quarantine behavior. Auto-quarantine applies only to tests with a **Flaky** status. So when a broken-type monitor flags a test that was auto-quarantined as flaky, the test becomes `BROKEN`, drops out of the auto-quarantine set, and its failures start blocking CI again. That is by design (a broken test is a real regression, not a flake to skip), but it means a broken classification is not a side-effect-free way to get an escalation alert. If you want the escalation signal *without* touching quarantine, use a **labeling** monitor instead (see [Alert every time a monitor flags a test](#alert-every-time-a-monitor-flags-a-test) below). Labels don't change health status, so an auto-quarantined test stays quarantined. Manually quarantined tests are unaffected either way. See [Quarantining](../quarantining/) and [Flake Detection](../detection/) for the full composite-status behavior. + + A flaky, auto-quarantined test with CI passing. A broken-type monitor fires and reclassifies it as BROKEN. Because broken tests are not quarantine candidates, it drops out of auto-quarantine and its failures block CI again. + + ## Alert every time a monitor flags a test -Use this when you want to know about every detection event on a test, including the ones that don't change its overall status — a second monitor piling on, or a labeling monitor surfacing a new pattern. +Use this when you want to know about every detection event on a test, including the ones that don't change its overall status (a second monitor piling on, or a labeling monitor surfacing a new pattern). **1. Subscribe to `test_case.monitor_status_changed`.** On your Slack endpoint, enable this event in addition to (or instead of) `v2.test_case.status_changed`. @@ -112,15 +120,15 @@ function handler(webhook) { } ``` -Because `test_case.monitor_status_changed` fires for every monitor independently, this catches a test that keeps tripping new monitors over time, even while its headline status stays `FLAKY`. The `monitor.type` field tells you which monitor fired, so you can branch on it — for example, route [labeling monitors](../management/test-labels#automatic-labeling-from-monitors) to a triage channel and health classification monitors to your on-call channel. +Because `test_case.monitor_status_changed` fires for every monitor independently, this catches a test that keeps tripping new monitors over time, even while its headline status stays `FLAKY`. The `monitor.type` field tells you which monitor fired, so you can branch on it: route [labeling monitors](../management/test-labels#automatic-labeling-from-monitors) to a triage channel and health classification monitors to your on-call channel. -Prefer labels over a broken classification when you want to triage a pattern without changing a test's health status — and, as noted above, without disturbing auto-quarantine. Configure a monitor's action as **Apply labels**, then filter `test_case.monitor_status_changed` on `monitor.type` to route those activations wherever they belong. See [Test Labels](../management/test-labels) for the full setup. +Prefer labels over a broken classification when you want to triage a pattern without changing a test's health status (and, as noted above, without disturbing auto-quarantine). Configure a monitor's action as **Apply labels**, then filter `test_case.monitor_status_changed` on `monitor.type` to route those activations wherever they belong. See [Test Labels](../management/test-labels) for the full setup. ## Related -- [Integration for Slack](../webhooks/slack-integration) — set up the Slack connection these transformations build on -- [Webhooks](../webhooks/index) — the full event catalog and field reference -- [Flake Detection](../detection/) — how monitors classify tests as flaky or broken -- [Test Labels](../management/test-labels) — apply and route labels with monitors +- [Integration for Slack](../webhooks/slack-integration). The Slack connection these transformations build on. +- [Webhooks](../webhooks/index). The full event catalog and field reference. +- [Flake Detection](../detection/). How monitors classify tests as flaky or broken. +- [Test Labels](../management/test-labels). Apply and route labels with monitors. From a824d191cd306276f9d6cb5a7acaac42f8a02422 Mon Sep 17 00:00:00 2001 From: Sam Gutentag <1404219+samgutentag@users.noreply.github.com> Date: Mon, 15 Jun 2026 10:55:20 -0700 Subject: [PATCH 7/8] docs(flaky-tests): unwrap quarantine Warning into its own section Apply CONTRIBUTING admonition rules to the escalation recipe: - The quarantine side effect was a two-paragraph wrapping what is really core content. The guide forbids wrapping a section in a callout, and a reversible, by-design behavior is not a Warning-grade hazard. Promote it to a '## The quarantine trade-off' section (prose), and move the broken-classification animation into it. - Trim the label so it no longer duplicates that section; it now covers only the optional label-routing mechanics. Page now has two callouts (Info for background, Tip for an optional path), none stacked or section-wrapping. Co-Authored-By: Claude Opus 4.8 (1M context) --- flaky-tests/recipes/alert-on-test-escalation.mdx | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/flaky-tests/recipes/alert-on-test-escalation.mdx b/flaky-tests/recipes/alert-on-test-escalation.mdx index 82441200..f7e9f9b6 100644 --- a/flaky-tests/recipes/alert-on-test-escalation.mdx +++ b/flaky-tests/recipes/alert-on-test-escalation.mdx @@ -70,11 +70,11 @@ function handler(webhook) { Both snippets replace the `handler` function from the [Slack integration guide](../webhooks/slack-integration#id-2.-customize-your-transformation); keep that guide's `summarizeTestCase` helper in the same transformation so the message body still renders. Its `previous_status → new_status` line makes the escalation obvious in the channel. - -Classifying a test as **broken** changes its health status, and that can change quarantine behavior. Auto-quarantine applies only to tests with a **Flaky** status. So when a broken-type monitor flags a test that was auto-quarantined as flaky, the test becomes `BROKEN`, drops out of the auto-quarantine set, and its failures start blocking CI again. That is by design (a broken test is a real regression, not a flake to skip), but it means a broken classification is not a side-effect-free way to get an escalation alert. +## The quarantine trade-off -If you want the escalation signal *without* touching quarantine, use a **labeling** monitor instead (see [Alert every time a monitor flags a test](#alert-every-time-a-monitor-flags-a-test) below). Labels don't change health status, so an auto-quarantined test stays quarantined. Manually quarantined tests are unaffected either way. See [Quarantining](../quarantining/) and [Flake Detection](../detection/) for the full composite-status behavior. - +Before you reach for a broken-type monitor, know what it does to quarantine. Classifying a test as broken changes its health status, and auto-quarantine applies only to tests with a **Flaky** status. So when a broken-type monitor flags a test that was auto-quarantined as flaky, the test becomes `BROKEN`, drops out of the auto-quarantine set, and its failures start blocking CI again. That is by design, since a broken test is a real regression, not a flake to skip. It also means a broken classification is not a side-effect-free way to get an escalation alert. + +Labels avoid this. A labeling monitor doesn't change health status, so an auto-quarantined test stays quarantined while you still get the activation signal (see [Alert every time a monitor flags a test](#alert-every-time-a-monitor-flags-a-test) below). Manually quarantined tests are unaffected either way. See [Quarantining](../quarantining/) and [Flake Detection](../detection/) for the full composite-status behavior. A flaky, auto-quarantined test with CI passing. A broken-type monitor fires and reclassifies it as BROKEN. Because broken tests are not quarantine candidates, it drops out of auto-quarantine and its failures block CI again. @@ -123,7 +123,7 @@ function handler(webhook) { Because `test_case.monitor_status_changed` fires for every monitor independently, this catches a test that keeps tripping new monitors over time, even while its headline status stays `FLAKY`. The `monitor.type` field tells you which monitor fired, so you can branch on it: route [labeling monitors](../management/test-labels#automatic-labeling-from-monitors) to a triage channel and health classification monitors to your on-call channel. -Prefer labels over a broken classification when you want to triage a pattern without changing a test's health status (and, as noted above, without disturbing auto-quarantine). Configure a monitor's action as **Apply labels**, then filter `test_case.monitor_status_changed` on `monitor.type` to route those activations wherever they belong. See [Test Labels](../management/test-labels) for the full setup. +To route by pattern without changing a test's health status, set a monitor's action to **Apply labels**, then branch on `monitor.type` in your transform to send those activations wherever they belong. See [Test Labels](../management/test-labels) for the full setup. ## Related From 487f8707024524fd28ebd6894ba87d1a5672eddd Mon Sep 17 00:00:00 2001 From: Sam Gutentag <1404219+samgutentag@users.noreply.github.com> Date: Mon, 15 Jun 2026 11:02:25 -0700 Subject: [PATCH 8/8] docs(flaky-tests): clarify monitor_status_changed fires per-monitor in gap diagram Accuracy pass on the event-granularity diagram. The column-3 event is correct (monitor_status_changed fires on Monitor B's own activation, independent of overall status), but the framing invited a 'why an event if FLAKY to FLAKY?' misread. Sharpen it: - column 3 sublabel 'already FLAKY' -> '2nd monitor, still FLAKY' - caption: 'catches both escalations' -> 'fires on every monitor activation, so it catches both' (Monitor A's first detection is not an escalation) broken-classification diagram audited, accurate, unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) --- assets/flaky-tests/recipes/event-granularity-gap.svg | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/assets/flaky-tests/recipes/event-granularity-gap.svg b/assets/flaky-tests/recipes/event-granularity-gap.svg index 03d77f37..06ec87c3 100644 --- a/assets/flaky-tests/recipes/event-granularity-gap.svg +++ b/assets/flaky-tests/recipes/event-granularity-gap.svg @@ -49,7 +49,7 @@ Monitor B fires - already FLAKY + 2nd monitor, still FLAKY @@ -86,6 +86,6 @@ status_changed fires only when the overall status changes, so Monitor B sends nothing. - monitor_status_changed fires on every activation, so it catches both escalations. + monitor_status_changed fires on every monitor activation, so it catches both.