From a9cacf53ba019e0f3f4e788f6269aac67c049eea Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Sat, 9 May 2026 10:56:53 +0000
Subject: [PATCH 1/3] docs(merge-queue): add Testing Duration chart section to
 metrics page

Documents the new Testing Duration chart in the Merge Queue Health tab,
including the available stat measures and the Outcome / Cycle Ended In
filter dropdowns. Notes the separate bucketing behavior from the other
health charts.

Source: trunk-io/trunk2#3919

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 merge-queue/administration/metrics.md | 42 +++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/merge-queue/administration/metrics.md b/merge-queue/administration/metrics.md
index bd462007..eecb053c 100644
--- a/merge-queue/administration/metrics.md
+++ b/merge-queue/administration/metrics.md
@@ -105,6 +105,48 @@ The time in queue can be displayed as different statistical measures. You can sh
 | P95     | The value below 95% of the time in queue falls.     |
 | P99     | The value below 99% of the time in queue falls.     |
 
+### Testing duration
+
+Testing duration shows how long each PR spends in the TESTING state within the Merge Queue, measured from when testing begins to when the cycle reaches its final state. Unlike the Conclusion count and Time in queue charts, testing duration uses separate data bucketing — hovering over a data point will not highlight the corresponding point on the other charts.
+
+Each data point represents one TESTING-to-final-state transition. A single PR can contribute multiple data points if its testing cycle restarted.
+
+The stat measures match those on the Time in queue chart:
+
+| Measure | Explanation |
+| ------- | ----------- |
+| Average | Average testing duration during the time bucket |
+| Minimum | The shortest testing duration in the time bucket |
+| Maximum | The longest testing duration in the time bucket |
+| Sum | The total of all testing durations added together |
+| P50 | The value below which 50% of testing durations fall |
+| P95 | The value below which 95% of testing durations fall |
+| P99 | The value below which 99% of testing durations fall |
+
+#### Filters
+
+Two dropdowns let you narrow the data shown in the chart:
+
+**Outcome** — filter by how each testing cycle ended:
+
+| Value | Meaning |
+| ----- | ------- |
+| All Outcomes | Include all testing cycles (default) |
+| Passed | Cycles where tests passed |
+| Failed | Cycles where tests failed |
+| Interrupted | Cycles interrupted before completion |
+| Cancelled | Cycles cancelled before tests ran to completion |
+
+**Cycle ended in** — filter by how the PR's overall merge cycle resolved:
+
+| Value | Meaning |
+| ----- | ------- |
+| All | Include all PR cycles (default) |
+| Merged | PR was ultimately merged |
+| Failed | PR ultimately failed out of the queue |
+| Cancelled | PR was cancelled |
+| In Flight | PR cycle is still in progress |
+
 ### Drill down into metrics
 
 From the **Conclusion count** and **Time in queue** charts, you can drill into any point or window on the graph to see the exact pull requests that made up those numbers.

From ca90308c895c5e6230568903eb677ae626a39131 Mon Sep 17 00:00:00 2001
From: Sam Gutentag <1404219+samgutentag@users.noreply.github.com>
Date: Tue, 12 May 2026 12:40:24 -0700
Subject: [PATCH 2/3] docs(merge-queue): merge in narrative bits from #649

Folds in the strongest pieces from PR #649 so it can be closed without
losing work:
- Adds the CI-vs-queue-wait-time framing line
- Wraps the restart caveat in a {% hint style="info" %} block (matches
  the page's existing pattern)
- Adds a worked filter example combining Outcome and Cycle ended in
- Cross-links to the Time in queue section
- Promotes the stat measures table under its own #### subheading
- Drops em dashes per house style

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 merge-queue/administration/metrics.md | 40 ++++++++++++++++-----------
 1 file changed, 24 insertions(+), 16 deletions(-)

diff --git a/merge-queue/administration/metrics.md b/merge-queue/administration/metrics.md
index eecb053c..2ff1874e 100644
--- a/merge-queue/administration/metrics.md
+++ b/merge-queue/administration/metrics.md
@@ -107,27 +107,19 @@ The time in queue can be displayed as different statistical measures. You can sh
 
 ### Testing duration
 
-Testing duration shows how long each PR spends in the TESTING state within the Merge Queue, measured from when testing begins to when the cycle reaches its final state. Unlike the Conclusion count and Time in queue charts, testing duration uses separate data bucketing — hovering over a data point will not highlight the corresponding point on the other charts.
+Testing duration shows how long each PR spends in the TESTING state within the Merge Queue, measured from when testing begins to when the cycle reaches its final state. Unlike the Conclusion count and Time in queue charts, testing duration uses separate data bucketing. Hovering over a data point will not highlight the corresponding point on the other charts.
 
-Each data point represents one TESTING-to-final-state transition. A single PR can contribute multiple data points if its testing cycle restarted.
-
-The stat measures match those on the Time in queue chart:
+This is distinct from [Time in queue](#time-in-queue), which measures total time from queue entry to exit. A PR that waits before testing starts will have a longer time in queue but the same testing duration. Use this chart to understand CI performance specifically, separate from queue wait time.
 
-| Measure | Explanation |
-| ------- | ----------- |
-| Average | Average testing duration during the time bucket |
-| Minimum | The shortest testing duration in the time bucket |
-| Maximum | The longest testing duration in the time bucket |
-| Sum | The total of all testing durations added together |
-| P50 | The value below which 50% of testing durations fall |
-| P95 | The value below which 95% of testing durations fall |
-| P99 | The value below which 99% of testing durations fall |
+{% hint style="info" %}
+Each data point represents one TESTING-to-final-state transition. A single PR can contribute multiple data points if its testing cycle restarted.
+{% endhint %}
 
 #### Filters
 
-Two dropdowns let you narrow the data shown in the chart:
+Two dropdowns let you narrow the data shown in the chart.
 
-**Outcome** — filter by how each testing cycle ended:
+**Outcome** filters by how each testing cycle ended:
 
 | Value | Meaning |
 | ----- | ------- |
@@ -137,7 +129,7 @@ Two dropdowns let you narrow the data shown in the chart:
 | Interrupted | Cycles interrupted before completion |
 | Cancelled | Cycles cancelled before tests ran to completion |
 
-**Cycle ended in** — filter by how the PR's overall merge cycle resolved:
+**Cycle ended in** filters by how the PR's overall merge cycle resolved:
 
 | Value | Meaning |
 | ----- | ------- |
@@ -147,6 +139,22 @@ Two dropdowns let you narrow the data shown in the chart:
 | Cancelled | PR was cancelled |
 | In Flight | PR cycle is still in progress |
 
+Combine the two to isolate specific patterns. For example, set **Outcome** to Passed and **Cycle ended in** to Merged to see testing durations for PRs that ultimately merged, giving you a clean baseline for CI speed without noise from canceled or failed runs.
+
+#### Statistical measures
+
+The stat measures match those on the Time in queue chart:
+
+| Measure | Explanation |
+| ------- | ----------- |
+| Average | Average testing duration during the time bucket |
+| Minimum | The shortest testing duration in the time bucket |
+| Maximum | The longest testing duration in the time bucket |
+| Sum | The total of all testing durations added together |
+| P50 | The value below which 50% of testing durations fall |
+| P95 | The value below which 95% of testing durations fall |
+| P99 | The value below which 99% of testing durations fall |
+
 ### Drill down into metrics
 
 From the **Conclusion count** and **Time in queue** charts, you can drill into any point or window on the graph to see the exact pull requests that made up those numbers.

From d1080f4b97af59ed59dfea6056759848146823fa Mon Sep 17 00:00:00 2001
From: Sam Gutentag <1404219+samgutentag@users.noreply.github.com>
Date: Tue, 12 May 2026 12:46:57 -0700
Subject: [PATCH 3/3] docs(merge-queue): address PR review feedback

- Clarify "the cycle" as "the testing cycle" on first reference
- Rewrite Interrupted vs Cancelled filter rows with proto-defined
  distinctions (Interrupted: run cut short, cycle may continue;
  Cancelled: cycle ends mid-test)
- Harmonize the Time in queue stat-measures table with the new
  Testing duration table (cleaner column layout, no trailing periods,
  "value below which" phrasing)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 merge-queue/administration/metrics.md | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/merge-queue/administration/metrics.md b/merge-queue/administration/metrics.md
index 2ff1874e..02cfd8a3 100644
--- a/merge-queue/administration/metrics.md
+++ b/merge-queue/administration/metrics.md
@@ -95,19 +95,19 @@ Understanding the amount of time a pull request spends in the queue is important
 
 The time in queue can be displayed as different statistical measures. You can show or hide them by using the **+ Add** button.
 
-| Measure | Explanation                                         |
-| ------- | --------------------------------------------------- |
-| Average | Average of all time in queue during the time bucket |
-| Minimum | The shortest time in queue in the time bucket.      |
-| Maximum | The longest time in queue in the time bucket.       |
-| Sum     | The total of all time in queue added together.      |
-| P50     | The value below 50% of the time in queue falls.     |
-| P95     | The value below 95% of the time in queue falls.     |
-| P99     | The value below 99% of the time in queue falls.     |
+| Measure | Explanation |
+| ------- | ----------- |
+| Average | Average time in queue during the time bucket |
+| Minimum | The shortest time in queue in the time bucket |
+| Maximum | The longest time in queue in the time bucket |
+| Sum | The total of all time in queue added together |
+| P50 | The value below which 50% of times in queue fall |
+| P95 | The value below which 95% of times in queue fall |
+| P99 | The value below which 99% of times in queue fall |
 
 ### Testing duration
 
-Testing duration shows how long each PR spends in the TESTING state within the Merge Queue, measured from when testing begins to when the cycle reaches its final state. Unlike the Conclusion count and Time in queue charts, testing duration uses separate data bucketing. Hovering over a data point will not highlight the corresponding point on the other charts.
+Testing duration shows how long each PR spends in the TESTING state within the Merge Queue, measured from when testing begins to when the testing cycle reaches its final state. Unlike the Conclusion count and Time in queue charts, testing duration uses separate data bucketing. Hovering over a data point will not highlight the corresponding point on the other charts.
 
 This is distinct from [Time in queue](#time-in-queue), which measures total time from queue entry to exit. A PR that waits before testing starts will have a longer time in queue but the same testing duration. Use this chart to understand CI performance specifically, separate from queue wait time.
 
@@ -126,8 +126,8 @@ Two dropdowns let you narrow the data shown in the chart.
 | All Outcomes | Include all testing cycles (default) |
 | Passed | Cycles where tests passed |
 | Failed | Cycles where tests failed |
-| Interrupted | Cycles interrupted before completion |
-| Cancelled | Cycles cancelled before tests ran to completion |
+| Interrupted | Test runs cut short by a restart, preempt, or base-branch change (the cycle may continue with a new run) |
+| Cancelled | Cycles cancelled mid-test (the cycle ends without a passing or failing result) |
 
 **Cycle ended in** filters by how the PR's overall merge cycle resolved: