POC: Time-based Helix test scheduling with AzDO history by MichaelSimons · Pull Request #54939 · dotnet/sdk

MichaelSimons · 2026-06-23T14:14:28Z

Summary

Proof-of-concept replacing the SDK's count-based Helix test partitioning with time-based scheduling, inspired by dotnet/roslyn's AssemblyScheduler. Uses historical test execution times from Azure DevOps to create work items targeting ~10 minutes each, at the individual test method level.

What's Changed

New Files

test/HelixTasks/AzdoClient.cs — Lightweight AzDO REST client (builds + test results APIs)
test/HelixTasks/TestHistoryManager.cs — Fetches per-test-method duration history from last successful CI build, with fallback to main
test/HelixTasks/TestMethodDiscovery.cs — Discovers individual test methods from PE metadata via reflection
test/HelixTasks/TimeBasedScheduler.cs — Greedy first-fit bin-packing scheduler (10-min target per work item)
test/HelixTasks.SchedulerTool/ — Local console app for validating scheduling plans offline

Modified Files

test/HelixTasks/SDKCustomCreateXUnitWorkItemsWithTestExclusion.cs — Added UseTimeBasedScheduling mode with direct vstest.console.dll invocation via RSP files
test/xunit-runner/XUnitRunner.targets — Passes time-based scheduling properties to MSBuild task
test/UnitTests.proj — Auto-configures from AzDO pipeline variables, enables time-based scheduling by default

Design

Scheduling: Greedy first-fit bin-packing using historical execution times from AzDO REST API
Fallback: Count-based partitioning (25 work items) when no history is available
Test invocation: dotnet exec vstest.console.dll @workitem.rsp — all arguments (assembly, loggers, blame, filter) in a response file read natively by vstest.console.dll, eliminating all command-line length constraints
Windows: Uses .cmd batch scripts for correct variable expansion
Branch resolution: Queries AzDO for history on the PR target branch, falls back to main
Parallel execution disabled: Temporarily disabled to stabilize test results and reduce noise from concurrency-related intermittent failures during validation

Adds time-based work item scheduling inspired by dotnet/roslyn's AssemblyScheduler. Instead of partitioning by method count, this uses historical test execution durations from Azure DevOps to create Helix work items targeting ~10 minutes each at the individual test method level. New files: - AzdoClient.cs: Lightweight REST client for AzDO builds/test results API - TestHistoryManager.cs: Fetches per-test duration history from last successful CI build, with branch fallback - TestMethodDiscovery.cs: Discovers individual test methods from compiled assemblies using reflection metadata - TimeBasedScheduler.cs: Greedy first-fit bin-packing scheduler with configurable target time, command-line length limits, and count-based fallback when history is unavailable - HelixTasks.SchedulerTool/: Local console app for validating scheduling plans without running in CI Modified: - SDKCustomCreateXUnitWorkItemsWithTestExclusion.cs: Added UseTimeBasedScheduling mode with AzDO parameters, integrated time-based scheduling path alongside existing count-based approach - HelixTasks.csproj: Added System.Text.Json, InternalsVisibleTo The existing count-based scheduling is preserved as the default and serves as fallback when history is unavailable. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Update XUnitRunner.targets to pass all new time-based scheduling properties to SDKCustomCreateXUnitWorkItemsWithTestExclusion. Add auto-configuration in UnitTests.proj using AzDO built-in variables: - AzdoProjectUri: derived from SYSTEM_COLLECTIONURI + SYSTEM_TEAMPROJECT - AzdoAccessToken: from SYSTEM_ACCESSTOKEN (already mapped in sdk-build.yml) - AzdoDefinitionId: from SYSTEM_DEFINITIONID - AzdoTargetBranch: from SYSTEM_PULLREQUEST_TARGETBRANCH (falls back to main) To enable: set UseTimeBasedScheduling=true in the pipeline or UnitTests.proj. All other config is auto-derived from the pipeline environment. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The method-level filter strings (FullyQualifiedName per test) are much longer than the old class-level filters. On Windows, cmd.exe has an 8191-character command line limit, so many work items were failing with 'The input line is too long' (exit code 255). Fix: Make MaxFilterLength OS-aware: - Windows: 7000 chars (leaving ~1200 for the command prefix) - POSIX: 25000 chars (bash supports ~128KB+) Also enforce the filter length limit in the count-based fallback path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Instead of passing the method-level --filter on the command line (which hits the 8191-char cmd.exe limit on Windows), write each work item's filter to a .rsp response file in the publish directory and reference it via @file.rsp on the command line. This is the same approach used by dotnet/roslyn's Helix test runner. The filter string can now be arbitrarily long, so work items are sized purely by time budget (or count-based fallback), not constrained by command-line length. The TimeBasedScheduler's MaxFilterLength is now set to 100K (effectively unlimited) since the rsp file has no length constraint. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

With filters in response files, work item sizing is purely driven by the time budget (or count for fallback). Remove all filter-length tracking and the isPosixShell parameter from TimeBasedScheduler. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Instead of 'dotnet test @filter.rsp' (which expands the RSP and hits the CreateProcess 32K limit), invoke vstest.console.dll directly: dotnet exec vstest.console.dll @workitem.rsp The RSP file contains ALL arguments (assembly, loggers, blame, filter) and vstest.console.dll reads it natively without spawning a child process — completely eliminating any command-line length constraint. This matches the approach used by dotnet/roslyn's Helix test runner. MTP projects continue to use dotnet exec with the test assembly directly since they already handle arguments without the CreateProcess issue. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

cmd.exe expands %variables% at parse time, so 'set /p var=<file&& dotnet exec %var%' expands %var% to empty string before set runs. Fix: write a .cmd batch script to the payload directory where each line is parsed independently. The Helix command is just the script filename. POSIX continues to use inline commands since \ is evaluated at runtime. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

xUnit: set parallelizeAssembly and parallelizeTestCollections to false MSTest: set MSTestParallelizeWorkers to 1 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Setting MSTestParallelizeWorkers=1 still causes MSTest targets to inject [Parallelize], which conflicts with [DoNotParallelize] attributes in several test projects. Setting scope to None prevents the attribute from being generated entirely, and is compatible with projects that already set MSTestParallelizeScope=None locally. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

On Windows Helix machines, DOTNET_ROOT may point to a system-installed .NET SDK with an incompatible (older) vstest.console.dll. This caused MissingMethodException crashes in all non-MTP test work items. Use HELIX_CORRELATION_PAYLOAD/d instead, which always contains the custom-built SDK matching the test assemblies. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Bare method names with exact-match filter missed [Theory]/[InlineData] test cases whose FQN includes parameters (e.g. Method(arg1, arg2)). Using 'FullyQualifiedName~Method' (contains) ensures all parameterized variants are matched, resolving ~2,800 missing tests per leg. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

MichaelSimons and others added 11 commits June 18, 2026 17:15

Enable time-based scheduling by default

b3d8c3c

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Disable parallel test execution for investigation

a9542f7

xUnit: set parallelizeAssembly and parallelizeTestCollections to false MSTest: set MSTestParallelizeWorkers to 1 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Merge branch 'main' into michaelsimons/helix-time-based-scheduler

16d7f0f

This was referenced Jun 23, 2026

Run-file tests failing with "The project file could not be loaded." #54819

Open

Test flakiness witt Microsoft.NET.Build.Tests.GivenThatWeWantBuildsToBeIncremental tests #54823

Open

MichaelSimons and others added 2 commits June 23, 2026 16:07

build-analysis Bot mentioned this pull request Jun 24, 2026

MTPHelpSnapshotTests.VerifyMTPHelpOutput snapshot mismatch (--progress option added) #54948

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

POC: Time-based Helix test scheduling with AzDO history#54939

POC: Time-based Helix test scheduling with AzDO history#54939
MichaelSimons wants to merge 13 commits into
mainfrom
michaelsimons/helix-time-based-scheduler

MichaelSimons commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MichaelSimons commented Jun 23, 2026

Summary

What's Changed

New Files

Modified Files

Design

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant