[release/9.0] Surface scheduled outerloop Helix work item failures (backport of #129049, #129629)#129908
Open
mmitche wants to merge 2 commits into
Open
Conversation
…otnet#129049) > [!NOTE] > This pull request was authored with the assistance of GitHub Copilot. Several scheduled outerloop pipelines (the `outerloop.yml` family: `runtime-libraries-coreclr outerloop` and its `-windows`/`-linux`/`-osx` variants) use an `always: false` scheduled trigger. With `always: false`, AzDO only starts a new scheduled run if the source changed **since the last _successful_ scheduled run**. Because the repo has many flaky outerloop tests, the Helix test work items virtually always have at least one failure, which fails the "Send to Helix" step and therefore the whole build. The build never reaches a `succeeded` state, so AzDO re-queues **the same, unchanged commit** day after day, submitting more and more Helix work for no benefit. (Empirically confirmed: a single commit was re-run and failed for 19 consecutive days; once a sibling definition produced a genuinely successful run, the same-SHA re-queue stopped.) `continueOnError: true` only downgrades the build to `partiallySucceeded`, which AzDO's `always: false` scheduler still does **not** treat as successful — so the same commit keeps getting re-queued. The Helix step must end **fully successful** (exit 0). Make the "Send to Helix" step actually succeed on scheduled runs by disabling the two Arcade `Microsoft.DotNet.Helix.Sdk` properties that fail the build (both default to `true`): - **`FailOnWorkItemFailure`** — `CheckHelixJobStatus` errors when a work item exits non-zero. - **`FailOnTestFailure`** — `CheckAzurePipelinesTestResults` errors when any published test failed. Setting both to `false` lets the msbuild step exit 0, producing a fully `succeeded` build. Failed tests are still published and visible in the test results tab; AzDO does not auto-degrade a build to `partiallySucceeded` just because a published test run contains failures — only a failing task would. - **`eng/pipelines/libraries/helix.yml`**: Added a `failOnTestFailures` parameter (default `true`, preserving today's behavior) wired to `/p:FailOnWorkItemFailure` and `/p:FailOnTestFailure` on the Send to Helix msbuild invocation. - **`eng/pipelines/libraries/outerloop.yml`**: Passes `failOnTestFailures: false` **only on scheduled runs** (`Build.Reason == 'Schedule'`) for all three matrix legs (Release, Debug, NET48). The new parameter defaults to `true`, so all other `helix.yml` callers are unaffected (none set `WaitForWorkItemCompletion` or these properties on this path, so they already resolve to `true`). Only scheduled outerloop runs change behavior. PR / rolling / manual outerloop runs continue to fail on Helix failures exactly as before. Build/compile breaks still fail scheduled runs (this only affects the Helix step). On scheduled runs, `FailOnWorkItemFailure=false` also masks work-item crashes/timeouts/infra failures, not just test-assertion failures. This is an accepted tradeoff for the goal of stopping the wasteful daily re-queue of unchanged commits; results remain visible in the Helix/test reporting. --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…net#129629) ## Problem PR dotnet#129049 made scheduled outerloop builds succeed when only Helix tests fail, by setting `FailOnWorkItemFailure`/`FailOnTestFailure` to `false` on scheduled runs (via the `failOnTestFailures: false` parameter). This stopped AzDO's `always: false` scheduler from re-queueing the same commit day after day. The side effect: failed Helix work items became **completely invisible** in the Azure DevOps timeline. The `Send to Helix` step is fully green, so there is no signal that work items failed (even though, for flaky outerloop, they almost always do). ## Fix Surface failed work items as **warnings** instead of silently dropping them. Warnings keep the failures visible in the timeline but do **not** degrade the build below `succeeded` (so the `always: false` re-queue fix from dotnet#129049 is preserved). - **`src/libraries/sendtohelixhelp.proj`**: new `WarnOnHelixWorkItemFailure` target (`AfterTargets=CheckHelixJobStatus`) that emits a `<Warning>` for each failed `@(CompletedWorkItem)` when `WarnOnHelixTestFailure=true`. This mirrors what the Arcade SDK's `CheckHelixJobStatus` would have *errored* on, but as a warning. - **`eng/pipelines/libraries/helix.yml`**: new `warnOnTestFailures` parameter (default `false`) wired to `/p:WarnOnHelixTestFailure`. - **`eng/pipelines/libraries/outerloop.yml`**: scheduled runs now set `warnOnTestFailures: true` alongside `failOnTestFailures: false` on all three legs. No warn-as-error change was needed: the `Send to Helix` step already runs with warnaserror disabled (`_warnAsErrorParamHelixOverride`), so these warnings are not promoted back into build-failing errors. ## Validation Ran the `runtime-libraries-coreclr outerloop` pipeline (dnceng-public def 125, [build 1472840](https://dev.azure.com/dnceng-public/public/_build/results?buildId=1472840)) with a temporary Manual gate. Multiple CoreCLR_Release legs completed **succeeded** with failed work items surfaced as warnings and **zero errors**, e.g.: ``` src/libraries/sendtohelixhelp.proj(364,5): warning : Work item System.Runtime.Numerics.Tests in job 2e01f1b1-... has failed. Failure log: https://helix.dot.net/api/.../console ``` Legs whose work items all passed produced no such warning, as expected. > [!NOTE] > This pull request was authored with the assistance of GitHub Copilot. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
|
Tagging subscribers to this area: @dotnet/area-infrastructure-libraries |
Contributor
There was a problem hiding this comment.
Pull request overview
Backports updates to the release/9.0 outerloop Helix pipeline behavior so that scheduled outerloop runs (always: false) no longer fail the build due to Helix work item/test failures (preventing Azure DevOps from re-queuing the same commit), while still surfacing those failures as timeline warnings for visibility.
Changes:
- Add an MSBuild target that emits warnings for failed Helix work items when explicitly enabled (
WarnOnHelixTestFailure=true). - Introduce
failOnTestFailuresandwarnOnTestFailuresparameters in the Helix pipeline template and wire them to Helix SDK properties. - Update scheduled outerloop runs to set
failOnTestFailures: falseandwarnOnTestFailures: true.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| src/libraries/sendtohelixhelp.proj | Adds an AfterTargets=CheckHelixJobStatus target to surface failed work items as MSBuild warnings when opted in. |
| eng/pipelines/libraries/outerloop.yml | On scheduled runs only, disables failing on Helix failures and enables warning surfacing for all outerloop matrix legs. |
| eng/pipelines/libraries/helix.yml | Adds parameters to control Helix failure behavior and passes the corresponding MSBuild properties to sendtohelix.proj. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
lewing
approved these changes
Jun 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Backport of #129049 and #129629 to release/9.0.
Combines both changes:
always:false) no longer fail the build on Helix work item/test failures, so flaky tests don't keep AzDO re-queueing the same commit.Conflicts:
helix.ymlparameters list (SuperPmi params present on this branch) was resolved by keeping both.Note
This pull request was authored with the assistance of GitHub Copilot.