Skip to content

Fix #14026 NET task host MSB4216 via dotnet-host drive-casing probe (builds on #17002)#17039

Merged
JanProvaznik merged 12 commits into
mainfrom
AlesProkop/fix-sdk-root-drive-casing-workaround
Jun 24, 2026
Merged

Fix #14026 NET task host MSB4216 via dotnet-host drive-casing probe (builds on #17002)#17039
JanProvaznik merged 12 commits into
mainfrom
AlesProkop/fix-sdk-root-drive-casing-workaround

Conversation

@AlesProkop

Copy link
Copy Markdown
Member

Context

Fixes the MSB4216 handshake failure that breaks .NET Framework MSBuild hosts (e.g. VS, and the Arcade CodeQL orchestrator) when they launch the .NET SDK task host on certain hosted CI agents. See dotnet/msbuild#14026 for the full root cause.

The handshake salt is a case-sensitive hash of the SDK tools directory. The host hashes $(NetCoreSdkRoot) (drive-letter casing propagated verbatim from the environment, e.g. ADO's D:\a\_work\1\s), while the child task host resolves its own path via Environment.ProcessPath, which on some agents reports the volume's canonical lowercase drive (d:). The salts then differ and the handshake fails:

error MSB4216: Could not run the "InstallDotNetCore" task because MSBuild could not
create or connect to a task host with runtime "NET" and architecture "x86".

Credit / relationship to #17002

This builds directly on top of @ViktorHofer's workaround in #17002 — the InitialTargets target that rewrites the drive-letter casing of $(NetCoreSdkRoot), the MSBuildRuntimeType == 'Full' gating, the NetCoreSdkRoot != '' guard, and the splice-only-the-drive-letter approach are all his design. Full credit to Viktor; his commits are preserved in this branch's history. This PR only swaps out the casing-detection mechanism, which was the one piece that did not fire on the affected agents.

What changed

The original probe used GetFinalPathNameByHandle on $(NetCoreSdkRoot). On the affected ADO agents that API returns an uppercase drive letter, so it never matched the child's lowercase d: and was a silent no-op (proven from the failing build's comm traces + binlog). Launching the SDK's own MSBuild.exe apphost instead fails, because its matching runtime is not resolvable in the host process environment.

This PR detects the casing by launching the SDK's dotnet host (dotnet.exe — the runtime host, which always runs and lives on the same volume as the SDK) on a tiny throwaway project whose inline task reads GetModuleFileNameW(NULL) — the exact API behind the child's Environment.ProcessPath. The returned drive letter is adopted only when it is the same letter as $(NetCoreSdkRoot) differing solely in case, so it can never change the actual drive, only its casing. Cached per drive; Windows-only; safe no-op when the probe cannot run.

Knobs

  • DisableNetCoreSdkRootDriveCasingWorkaround=true — skip entirely.
  • NetCoreSdkRootDriveCasingOverride=lower|upper — deterministic override without probing (emergency/testing).

Validation

Reproduced and validated on the actual failing agent type via the microsoft-testfx CodeQL pipeline:

  • Override path (=lower): rewrote Tools.proj's NetCoreSdkRoot D:d:, MSB4216 eliminated, build green — confirming the salt reads the live, target-mutated property.
  • Auto probe (no override, two independent runs that both landed on a "bad" agent): probe resolved d:, target logged Normalized NetCoreSdkRoot drive casing 'D' -> 'd', MSB4216 count 0, builds green.

Microsoft.DotNet.Arcade.Sdk packs clean (0 warnings, 0 errors) with this change.

Temporary

This is a stopgap until the permanent child-side fix (dotnet/msbuild#14027, handshake salt widening) flows into both the VS MSBuild host and the .NET SDK. Remove this workaround once both sides carry it.

cc @ViktorHofer

ViktorHofer and others added 12 commits June 11, 2026 11:49
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
The previous LoadLibraryEx(LOAD_LIBRARY_AS_DATAFILE)+GetModuleFileName(hModule)
technique was a silent no-op: GetModuleFileName returns ERROR_MOD_NOT_FOUND for
an image loaded only as a data file (the SDK's MSBuild.exe/MSBuild.dll are not
otherwise loaded in the VS MSBuild process), so the casing was never fixed and
MSB4216 persisted (verified in build 2998220).

Open the SDK root directory with CreateFileW(FILE_FLAG_BACKUP_SEMANTICS) and ask
the OS for its canonical path via GetFinalPathNameByHandleW. This is load- and
bitness-independent. Splice only the canonical drive letter, and only when the
rest of the path is otherwise case-insensitively identical, so symlink/junction
resolution (which GetFinalPathNameByHandle performs but the child's
Environment.ProcessPath does not) cannot desync the two salts.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Make the temporary #14026 workaround leave visible evidence in CI console
logs (it only fires when the drive casing actually differs, so this is not
noisy on healthy machines).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
NetCoreSdkRoot carries a trailing directory separator, but
GetFinalPathNameByHandle returns the canonical path without one. The
strict full-path equality guard therefore always failed, making the
drive-casing normalization a silent no-op and leaving MSB4216 (#14026)
unfixed. Compare with trailing separators trimmed from both sides.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…orkaround

The GetFinalPathNameByHandle probe is kept as the default, but its drive-letter
casing cannot be proven correct on every agent (it may no-op where the loader and
GetFinalPathNameByHandle disagree, or where the SDK path crosses a junction).

Add two opt-in knobs so the #14026 workaround is robust and manually verifiable:
* DisableNetCoreSdkRootDriveCasingWorkaround=true skips the target entirely.
* NetCoreSdkRootDriveCasingOverride=lower|upper deterministically forces the
  drive-letter casing without touching the filesystem.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the GetFinalPathNameByHandle probe (which returns an uppercase drive on
the affected ADO agents and was a proven no-op) with one that launches the SDK's
dotnet host and reads GetModuleFileNameW(NULL) - the exact API behind the .NET
task host child's Environment.ProcessPath - so the host learns and adopts the
child's drive-letter casing. Casing-only guard; cached; override/disable knobs.

Validated green on a failing ADO CodeQL agent (microsoft-testfx): probe resolved
'd:', NetCoreSdkRoot rewritten D:->d:, MSB4216 eliminated.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 23, 2026 10:16

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a temporary Arcade SDK workaround for dotnet/msbuild#14026 by normalizing the drive-letter casing of $(NetCoreSdkRoot) in .NET Framework MSBuild (“Full” runtime) so the .NET task host handshake salt matches between host and child on affected Windows agents.

Changes:

  • Adds InitialTargets="NormalizeNetCoreSdkRootCasing" so the normalization runs early in evaluation/execution.
  • Introduces an inline NormalizeSdkRootDriveCasing task that probes the SDK volume’s canonical drive casing by launching the SDK’s dotnet.exe and reading GetModuleFileNameW(NULL) output from a throwaway MSBuild project.
  • Adds opt-out and deterministic knobs: DisableNetCoreSdkRootDriveCasingWorkaround and NetCoreSdkRootDriveCasingOverride=lower|upper.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +470 to +480
using (var p = Process.Start(psi))
{
string stdout = p.StandardOutput.ReadToEnd();
string stderr = p.StandardError.ReadToEnd();
if (!p.WaitForExit(120000))
{
try { p.Kill(); } catch { }
Log.LogMessage(MessageImportance.Low, "NormalizeSdkRootDriveCasing: probe timed out launching '{0}'.", dotnetExe);
return '\0';
}

@JanProvaznik JanProvaznik left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hate using inline tasks but this is a validated workaround that can be improved once it unblocks infra

@rainersigwald rainersigwald left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"inline task inside a project inside an inline task" is groundbreaking levels of Build Crime 🫡

}
catch (Exception ex)
{
Log.LogMessage(MessageImportance.Low, "NormalizeSdkRootDriveCasing skipped: {0}", ex.Message);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if it fails log the whole dang stack :)

Suggested change
Log.LogMessage(MessageImportance.Low, "NormalizeSdkRootDriveCasing skipped: {0}", ex.Message);
Log.LogMessage(MessageImportance.Low, "NormalizeSdkRootDriveCasing skipped: {0}", ex);

<Result ParameterType="System.String" Output="true" />
</ParameterGroup>
<Task>
<Code Type="Class" Language="cs"><![CDATA[

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style nit, no need to fix here (since it'd probably require package authoring changes): Any time there's anything other than a short snippet, I prefer using the Source attribute and putting the code in a .cs file (so it can get at least minimal syntax highlighting).

Comment on lines +289 to +291
// Cache the registered drive letter per drive for the lifetime of the MSBuild process,
// so the (one-time) dotnet-host probe launch is not repeated for every project's InitialTargets.
private static readonly object s_lock = new object();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea. I would generally have single-instanced it by pushing the invocation of the task into a single project invocation, by having the body of the InitialTargets target be just a call to the utility project. That would truly single-instance it per build, where this will do it once/worker node process.

Of course there's subtlety to that, see dotnet/msbuild#11390. Since this is tested I'd move forward with the current implementation rather than rewriting to my preference.

// Cache the registered drive letter per drive for the lifetime of the MSBuild process,
// so the (one-time) dotnet-host probe launch is not repeated for every project's InitialTargets.
private static readonly object s_lock = new object();
private static readonly Dictionary<char, char> s_driveCache = new Dictionary<char, char>();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be case-insensitive?

Comment on lines +374 to +375
// directory (<dotnetRoot>\sdk\<ver>\ -> <dotnetRoot>\dotnet.exe), then DOTNET_HOST_PATH but only
// if it is on the same drive letter as the SDK. Returns null if none exists.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the case for DOTNET_HOST_PATH being needed?

Comment on lines +430 to +431
// recurse into this workaround. The inline task P/Invokes GetModuleFileNameW(NULL) - the exact
// API behind the child task host's Environment.ProcessPath - so we read the same drive-letter

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we can't just call Environment.ProcessPath because it wouldn't compile?

var psi = new ProcessStartInfo
{
FileName = dotnetExe,
Arguments = "msbuild \"" + proj + "\" -nologo -nodeReuse:false -v:m -t:P",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit: use short form arguments only on the interactive command line

Suggested change
Arguments = "msbuild \"" + proj + "\" -nologo -nodeReuse:false -v:m -t:P",
Arguments = "msbuild \"" + proj + "\" -nologo -nodeReuse:false -verbosity:minimal -target:P",

else
{
Log.LogMessage(MessageImportance.Low,
"NormalizeSdkRootDriveCasing: probe produced no result (exit {0}). stderr=[{1}]",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't ever write to stderr, do we? I think stdout would likely be more informative

@JanProvaznik

Copy link
Copy Markdown
Member

merging to unblock partners asap
@AlesProkop please make an issue

  1. to make it less unhinged
  2. to remove it when no longer needed

@JanProvaznik JanProvaznik merged commit e979774 into main Jun 24, 2026
11 checks passed
@JanProvaznik JanProvaznik deleted the AlesProkop/fix-sdk-root-drive-casing-workaround branch June 24, 2026 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants