Fix interpreter stackwalk crash when seeding from a debugger-supplied context by steveisok · Pull Request #129907 · dotnet/runtime

steveisok · 2026-06-26T19:53:48Z

Summary

Debugging an interpreter-based app (e.g. .NET MAUI on the iOS simulator) through
Visual Studio crashes the target process when stepping. The crash is in the DAC's
StackFrameIterator::ResetRegDisp (null-deref in release, _ASSERTE in checked),
driven by CordbThread::GetActiveFunctions → ShimStackWalk →
CordbStackWalk::SetContext → DacDbiInterfaceImpl::SetStackWalkCurrentContext.

Root cause

When seeding a stackwalk in interpreted code, Init and ResetRegDisp derive the
owning InterpreterFrame from the first-argument register of pCurrentContext:

PTR_InterpreterFrame pOwning =
    dac_cast<PTR_InterpreterFrame>((TADDR)GetFirstArgReg(m_crawl.pRD->pCurrentContext));

That "first-arg register holds the owning InterpreterFrame " convention is only
synthesized by the interpreter code manager for a CONTEXT produced by a normal
runtime stackwalk. It does not hold for a CONTEXT supplied by the debugger via
ICorDebugStackWalk::SetContext : SetStackWalkCurrentContext copies a raw thread
CONTEXT into pCurrentContext , so the first-arg register holds the interpreted
method's working value — not the frame pointer — when the thread is stopped at a
breakpoint deep inside an interpreted method.

Reading the register then yields NULL/garbage, which null-dereferences (release) or
trips _ASSERTE (checked) in ResetRegDisp . Visual Studio reproduces it easily,
because its GetActiveFunctions / SetContext path seeds the walk with an arbitrary
mid-method context; VS Code does not issue that walk and thus is harder to get the bug to pop.

Fix

Add GetOwningInterpreterFrame , which keeps the first-arg-register fast path but
falls back to the topmost InterpreterFrame on the thread's explicit-frame chain
when the register does not point at a valid InterpreterFrame . The frame chain is
reliable regardless of how the CONTEXT was produced. Both seed sites also normalize a
NULL PtrNextFrame() (the m_Next of a not-yet-fully-linked frame) to FRAME_TOP
so the chain stays terminated.

Testing

• Repro: step in VS while debugging a .NET MAUI app on the iOS simulator (interpreter).
• Before: target crashes in ResetRegDisp (checked _ASSERTE / release null-deref).
• After: stepping works; VS Call Stack / Parallel Stacks populate correctly.

A Frame's m_Next is NULL until the frame is fully pushed (initialized to nullptr in the Frame constructor, linked into the chain only by Push()). A stackwalk can observe a not-yet-fully-linked frame - for example the DAC walking a thread suspended for debugger stepping over interpreter frames. The walker relies on the explicit-frame chain being terminated by FRAME_TOP and only ever checks against FRAME_TOP, so advancing onto a frame whose m_Next is NULL left m_crawl.pFrame == NULL and caused a null dereference in StackFrameIterator::CheckForSkippedFrames (and other deref sites). Normalize a NULL next-frame to FRAME_TOP in GotoNextFrame so the chain is always terminated and the rest of the walk never dereferences NULL. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

StackFrameIterator::Init and ResetRegDisp seed an interpreter stackwalk by reading the owning InterpreterFrame from the first-argument register of pCurrentContext. That is only a valid convention for a CONTEXT produced by a normal runtime stackwalk, where the interpreter code manager (eetwain.cpp) synthesizes it. It does NOT hold for a CONTEXT supplied by the debugger via ICorDebugStackWalk::SetContext: DacDbiInterfaceImpl::SetStackWalkCurrentContext copies a raw thread CONTEXT into pCurrentContext, so the first-arg register holds the interpreted method's working value rather than the InterpreterFrame pointer (e.g. when VS GetActiveFunctions walks a thread stopped at a breakpoint deep inside an interpreted method). Reading the register then yields NULL/garbage, tripping the _ASSERTE in a checked DAC (or null-dereferencing in release) inside ResetRegDisp. Only VS reproduces this because its GetActiveFunctions/SetContext path seeds the walk with an arbitrary mid-method context; VSCode does not. Add GetOwningInterpreterFrame, which keeps the first-arg-register fast path but falls back to the topmost InterpreterFrame on the thread's explicit-frame chain when the register does not point at a valid InterpreterFrame. Also normalize a NULL PtrNextFrame() (m_Next of a not-yet-fully-linked frame) to FRAME_TOP at both seed sites so the chain stays terminated. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

dotnet-policy-service · 2026-06-26T19:55:07Z

Tagging subscribers to this area: @JulieLeeMSFT, @BrzVlad, @janvorli, @kg
See info in area-owners.md if you want to be subscribed.

Copilot

Pull request overview

This PR updates CoreCLR stack walking to be resilient when a stackwalk in interpreted code is seeded from a debugger-supplied CONTEXT (rather than one produced by the runtime’s own stackwalk), avoiding crashes/asserts when the “first-arg register contains owning InterpreterFrame*” convention doesn’t hold.

Changes:

Normalize a NULL explicit-frame m_Next to FRAME_TOP during CrawlFrame::GotoNextFrame() to keep frame-chain iteration terminating safely.
Add GetOwningInterpreterFrame(...) and use it from StackFrameIterator::Init and ResetRegDisp when starting from interpreted-code contexts.
Treat InterpreterFrame::PtrNextFrame() returning NULL as end-of-chain (FRAME_TOP) when skipping past the owning interpreter frame.

+    PTR_InterpreterFrame pOwning =
+        dac_cast<PTR_InterpreterFrame>((TADDR)GetFirstArgReg(pContext));
+
+    if ((pOwning == NULL) || (pOwning->GetFrameIdentifier() != FrameIdentifier::InterpreterFrame))
+    {
+        pOwning = NULL;
+        for (PTR_Frame pFrame = pThread->GetFrame();
+             pFrame != FRAME_TOP;
+             pFrame = pFrame->PtrNextFrame())
+        {
+            if (pFrame->GetFrameIdentifier() == FrameIdentifier::InterpreterFrame)
+            {
+                pOwning = dac_cast<PTR_InterpreterFrame>(pFrame);
+                break;
+            }
+        }
+    }


noahfalk · 2026-06-26T21:24:45Z

+// establishes a convention where the first-argument register of pCurrentContext holds the
+// owning InterpreterFrame (see the InterpreterCodeManager in eetwain.cpp). That convention
+// does NOT hold for a CONTEXT supplied by the debugger via ICorDebugStackWalk::SetContext:
+// DacDbiInterfaceImpl::SetStackWalkCurrentContext copies a raw thread CONTEXT into


This explanation doesn't make sense to me.

My mental model is that there are two kinds of CONTEXT, those where IP points at interpretted bytecode and those where it doesn't. If the IP points at interpreted bytecode then we'd also expect GetFirstArgReg(pContext) to point at the InterpreterFrame. Creating this method implies that invariant isn't holding, but if so I think we should either:

Find where the invariant is getting broken and fix it. If the invariant doesn't hold here then the bug occured earlier, wherever the CONTEXT was generated.

Define a new looser invariant with a clear understanding of why the previous simpler invariant can't be used.

The comment mentions the debugger provides the CONTEXT and treats it like a black-box. If the debugger provides a bad seed CONTEXT then the bug is likely in the debugger or in something the debugger read from. We'd need to peel those layers and find the root cause.

Deterministic, IDE-independent reproduction of the interpreter stackwalk seed crash fixed in StackFrameIterator::Init / ResetRegDisp. A managed target runs a method under the interpreter and calls Debugger.Break() inside it. A minimal native ICorDebug client (dbgshim launch + managed callback) responds to the Break event by calling ICorDebugThread2::GetActiveFunctions on the stopped thread, which drives the V2 shim stackwalk (ShimStackWalk::Populate -> CordbStackWalk::SetContext -> DacDbiInterfaceImpl::SetStackWalkCurrentContext -> StackFrameIterator::ResetRegDisp) with a debugger-supplied CONTEXT in interpreted code. For local debugging the DAC is loaded into the harness, so on a buggy Checked build the harness asserts in ResetRegDisp (Release: null-deref), and on a fixed build GetActiveFunctions returns cleanly and the harness exits 0. This exercises the same code path that VS hits on the iOS simulator via the remote ICorDebug target. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

steveisok and others added 2 commits June 26, 2026 11:52

Copilot AI review requested due to automatic review settings June 26, 2026 19:53

github-actions Bot added the area-CodeGen-Interpreter-coreclr label Jun 26, 2026

dotnet-policy-service Bot assigned steveisok Jun 26, 2026

Copilot started reviewing on behalf of steveisok June 26, 2026 19:54 View session

Copilot AI reviewed Jun 26, 2026

View reviewed changes

noahfalk reviewed Jun 26, 2026

View reviewed changes

This was referenced Jun 27, 2026

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

Flaky timeout in CreateFileChangeToken_RootDeletedAndRecreated_TokenFiresWhenFileCreated #129691

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix interpreter stackwalk crash when seeding from a debugger-supplied context#129907

Fix interpreter stackwalk crash when seeding from a debugger-supplied context#129907
steveisok wants to merge 3 commits into
dotnet:mainfrom
steveisok:fix-remote-stepping

steveisok commented Jun 26, 2026 •

edited

Loading

Uh oh!

dotnet-policy-service Bot commented Jun 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

noahfalk Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

steveisok commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Fix

Testing

Uh oh!

dotnet-policy-service Bot commented Jun 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

noahfalk Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

steveisok commented Jun 26, 2026 •

edited

Loading