Fix #14026 NET task host MSB4216 via dotnet-host drive-casing probe (builds on #17002)#17039
Conversation
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
The previous LoadLibraryEx(LOAD_LIBRARY_AS_DATAFILE)+GetModuleFileName(hModule) technique was a silent no-op: GetModuleFileName returns ERROR_MOD_NOT_FOUND for an image loaded only as a data file (the SDK's MSBuild.exe/MSBuild.dll are not otherwise loaded in the VS MSBuild process), so the casing was never fixed and MSB4216 persisted (verified in build 2998220). Open the SDK root directory with CreateFileW(FILE_FLAG_BACKUP_SEMANTICS) and ask the OS for its canonical path via GetFinalPathNameByHandleW. This is load- and bitness-independent. Splice only the canonical drive letter, and only when the rest of the path is otherwise case-insensitively identical, so symlink/junction resolution (which GetFinalPathNameByHandle performs but the child's Environment.ProcessPath does not) cannot desync the two salts. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Make the temporary #14026 workaround leave visible evidence in CI console logs (it only fires when the drive casing actually differs, so this is not noisy on healthy machines). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
NetCoreSdkRoot carries a trailing directory separator, but GetFinalPathNameByHandle returns the canonical path without one. The strict full-path equality guard therefore always failed, making the drive-casing normalization a silent no-op and leaving MSB4216 (#14026) unfixed. Compare with trailing separators trimmed from both sides. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…orkaround The GetFinalPathNameByHandle probe is kept as the default, but its drive-letter casing cannot be proven correct on every agent (it may no-op where the loader and GetFinalPathNameByHandle disagree, or where the SDK path crosses a junction). Add two opt-in knobs so the #14026 workaround is robust and manually verifiable: * DisableNetCoreSdkRootDriveCasingWorkaround=true skips the target entirely. * NetCoreSdkRootDriveCasingOverride=lower|upper deterministically forces the drive-letter casing without touching the filesystem. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the GetFinalPathNameByHandle probe (which returns an uppercase drive on the affected ADO agents and was a proven no-op) with one that launches the SDK's dotnet host and reads GetModuleFileNameW(NULL) - the exact API behind the .NET task host child's Environment.ProcessPath - so the host learns and adopts the child's drive-letter casing. Casing-only guard; cached; override/disable knobs. Validated green on a failing ADO CodeQL agent (microsoft-testfx): probe resolved 'd:', NetCoreSdkRoot rewritten D:->d:, MSB4216 eliminated. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR adds a temporary Arcade SDK workaround for dotnet/msbuild#14026 by normalizing the drive-letter casing of $(NetCoreSdkRoot) in .NET Framework MSBuild (“Full” runtime) so the .NET task host handshake salt matches between host and child on affected Windows agents.
Changes:
- Adds
InitialTargets="NormalizeNetCoreSdkRootCasing"so the normalization runs early in evaluation/execution. - Introduces an inline
NormalizeSdkRootDriveCasingtask that probes the SDK volume’s canonical drive casing by launching the SDK’sdotnet.exeand readingGetModuleFileNameW(NULL)output from a throwaway MSBuild project. - Adds opt-out and deterministic knobs:
DisableNetCoreSdkRootDriveCasingWorkaroundandNetCoreSdkRootDriveCasingOverride=lower|upper.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| using (var p = Process.Start(psi)) | ||
| { | ||
| string stdout = p.StandardOutput.ReadToEnd(); | ||
| string stderr = p.StandardError.ReadToEnd(); | ||
| if (!p.WaitForExit(120000)) | ||
| { | ||
| try { p.Kill(); } catch { } | ||
| Log.LogMessage(MessageImportance.Low, "NormalizeSdkRootDriveCasing: probe timed out launching '{0}'.", dotnetExe); | ||
| return '\0'; | ||
| } | ||
|
|
JanProvaznik
left a comment
There was a problem hiding this comment.
I hate using inline tasks but this is a validated workaround that can be improved once it unblocks infra
rainersigwald
left a comment
There was a problem hiding this comment.
"inline task inside a project inside an inline task" is groundbreaking levels of Build Crime 🫡
| } | ||
| catch (Exception ex) | ||
| { | ||
| Log.LogMessage(MessageImportance.Low, "NormalizeSdkRootDriveCasing skipped: {0}", ex.Message); |
There was a problem hiding this comment.
nit: if it fails log the whole dang stack :)
| Log.LogMessage(MessageImportance.Low, "NormalizeSdkRootDriveCasing skipped: {0}", ex.Message); | |
| Log.LogMessage(MessageImportance.Low, "NormalizeSdkRootDriveCasing skipped: {0}", ex); |
| <Result ParameterType="System.String" Output="true" /> | ||
| </ParameterGroup> | ||
| <Task> | ||
| <Code Type="Class" Language="cs"><![CDATA[ |
There was a problem hiding this comment.
style nit, no need to fix here (since it'd probably require package authoring changes): Any time there's anything other than a short snippet, I prefer using the Source attribute and putting the code in a .cs file (so it can get at least minimal syntax highlighting).
| // Cache the registered drive letter per drive for the lifetime of the MSBuild process, | ||
| // so the (one-time) dotnet-host probe launch is not repeated for every project's InitialTargets. | ||
| private static readonly object s_lock = new object(); |
There was a problem hiding this comment.
This is a good idea. I would generally have single-instanced it by pushing the invocation of the task into a single project invocation, by having the body of the InitialTargets target be just a call to the utility project. That would truly single-instance it per build, where this will do it once/worker node process.
Of course there's subtlety to that, see dotnet/msbuild#11390. Since this is tested I'd move forward with the current implementation rather than rewriting to my preference.
| // Cache the registered drive letter per drive for the lifetime of the MSBuild process, | ||
| // so the (one-time) dotnet-host probe launch is not repeated for every project's InitialTargets. | ||
| private static readonly object s_lock = new object(); | ||
| private static readonly Dictionary<char, char> s_driveCache = new Dictionary<char, char>(); |
There was a problem hiding this comment.
Should this be case-insensitive?
| // directory (<dotnetRoot>\sdk\<ver>\ -> <dotnetRoot>\dotnet.exe), then DOTNET_HOST_PATH but only | ||
| // if it is on the same drive letter as the SDK. Returns null if none exists. |
There was a problem hiding this comment.
What's the case for DOTNET_HOST_PATH being needed?
| // recurse into this workaround. The inline task P/Invokes GetModuleFileNameW(NULL) - the exact | ||
| // API behind the child task host's Environment.ProcessPath - so we read the same drive-letter |
There was a problem hiding this comment.
but we can't just call Environment.ProcessPath because it wouldn't compile?
| var psi = new ProcessStartInfo | ||
| { | ||
| FileName = dotnetExe, | ||
| Arguments = "msbuild \"" + proj + "\" -nologo -nodeReuse:false -v:m -t:P", |
There was a problem hiding this comment.
Style nit: use short form arguments only on the interactive command line
| Arguments = "msbuild \"" + proj + "\" -nologo -nodeReuse:false -v:m -t:P", | |
| Arguments = "msbuild \"" + proj + "\" -nologo -nodeReuse:false -verbosity:minimal -target:P", |
| else | ||
| { | ||
| Log.LogMessage(MessageImportance.Low, | ||
| "NormalizeSdkRootDriveCasing: probe produced no result (exit {0}). stderr=[{1}]", |
There was a problem hiding this comment.
we don't ever write to stderr, do we? I think stdout would likely be more informative
|
merging to unblock partners asap
|
Context
Fixes the
MSB4216handshake failure that breaks .NET Framework MSBuild hosts (e.g. VS, and the Arcade CodeQL orchestrator) when they launch the .NET SDK task host on certain hosted CI agents. See dotnet/msbuild#14026 for the full root cause.The handshake salt is a case-sensitive hash of the SDK tools directory. The host hashes
$(NetCoreSdkRoot)(drive-letter casing propagated verbatim from the environment, e.g. ADO'sD:\a\_work\1\s), while the child task host resolves its own path viaEnvironment.ProcessPath, which on some agents reports the volume's canonical lowercase drive (d:). The salts then differ and the handshake fails:Credit / relationship to #17002
This builds directly on top of @ViktorHofer's workaround in #17002 — the
InitialTargetstarget that rewrites the drive-letter casing of$(NetCoreSdkRoot), theMSBuildRuntimeType == 'Full'gating, theNetCoreSdkRoot != ''guard, and the splice-only-the-drive-letter approach are all his design. Full credit to Viktor; his commits are preserved in this branch's history. This PR only swaps out the casing-detection mechanism, which was the one piece that did not fire on the affected agents.What changed
The original probe used
GetFinalPathNameByHandleon$(NetCoreSdkRoot). On the affected ADO agents that API returns an uppercase drive letter, so it never matched the child's lowercased:and was a silent no-op (proven from the failing build's comm traces + binlog). Launching the SDK's ownMSBuild.exeapphost instead fails, because its matching runtime is not resolvable in the host process environment.This PR detects the casing by launching the SDK's
dotnethost (dotnet.exe— the runtime host, which always runs and lives on the same volume as the SDK) on a tiny throwaway project whose inline task readsGetModuleFileNameW(NULL)— the exact API behind the child'sEnvironment.ProcessPath. The returned drive letter is adopted only when it is the same letter as$(NetCoreSdkRoot)differing solely in case, so it can never change the actual drive, only its casing. Cached per drive; Windows-only; safe no-op when the probe cannot run.Knobs
DisableNetCoreSdkRootDriveCasingWorkaround=true— skip entirely.NetCoreSdkRootDriveCasingOverride=lower|upper— deterministic override without probing (emergency/testing).Validation
Reproduced and validated on the actual failing agent type via the
microsoft-testfxCodeQL pipeline:=lower): rewroteTools.proj'sNetCoreSdkRootD:→d:,MSB4216eliminated, build green — confirming the salt reads the live, target-mutated property.d:, target loggedNormalized NetCoreSdkRoot drive casing 'D' -> 'd',MSB4216count 0, builds green.Microsoft.DotNet.Arcade.Sdkpacks clean (0 warnings, 0 errors) with this change.Temporary
This is a stopgap until the permanent child-side fix (dotnet/msbuild#14027, handshake salt widening) flows into both the VS MSBuild host and the .NET SDK. Remove this workaround once both sides carry it.
cc @ViktorHofer