Skip to content

Fix interpreter stackwalk crash when seeding from a debugger-supplied context#129907

Draft
steveisok wants to merge 3 commits into
dotnet:mainfrom
steveisok:fix-remote-stepping
Draft

Fix interpreter stackwalk crash when seeding from a debugger-supplied context#129907
steveisok wants to merge 3 commits into
dotnet:mainfrom
steveisok:fix-remote-stepping

Conversation

@steveisok

@steveisok steveisok commented Jun 26, 2026

Copy link
Copy Markdown
Member

Summary

Debugging an interpreter-based app (e.g. .NET MAUI on the iOS simulator) through
Visual Studio crashes the target process when stepping. The crash is in the DAC's
StackFrameIterator::ResetRegDisp (null-deref in release, _ASSERTE in checked),
driven by CordbThread::GetActiveFunctionsShimStackWalk
CordbStackWalk::SetContextDacDbiInterfaceImpl::SetStackWalkCurrentContext.

Root cause

When seeding a stackwalk in interpreted code, Init and ResetRegDisp derive the
owning InterpreterFrame from the first-argument register of pCurrentContext:

PTR_InterpreterFrame pOwning =
    dac_cast<PTR_InterpreterFrame>((TADDR)GetFirstArgReg(m_crawl.pRD->pCurrentContext));

That "first-arg register holds the owning  InterpreterFrame " convention is only
synthesized by the interpreter code manager for a CONTEXT produced by a normal
runtime stackwalk. It does not hold for a CONTEXT supplied by the debugger via
 ICorDebugStackWalk::SetContext :  SetStackWalkCurrentContext  copies a raw thread
CONTEXT into  pCurrentContext , so the first-arg register holds the interpreted
method's working value — not the frame pointer — when the thread is stopped at a
breakpoint deep inside an interpreted method.

Reading the register then yields NULL/garbage, which null-dereferences (release) or
trips  _ASSERTE  (checked) in  ResetRegDisp . Visual Studio reproduces it easily,
because its  GetActiveFunctions / SetContext  path seeds the walk with an arbitrary
mid-method context; VS Code does not issue that walk and thus is harder to get the bug to pop.

Fix

Add  GetOwningInterpreterFrame , which keeps the first-arg-register fast path but
falls back to the topmost  InterpreterFrame  on the thread's explicit-frame chain
when the register does not point at a valid  InterpreterFrame . The frame chain is
reliable regardless of how the CONTEXT was produced. Both seed sites also normalize a
NULL  PtrNextFrame()  (the  m_Next  of a not-yet-fully-linked frame) to  FRAME_TOP 
so the chain stays terminated.

Testing

• Repro: step in VS while debugging a .NET MAUI app on the iOS simulator (interpreter).
• Before: target crashes in  ResetRegDisp  (checked  _ASSERTE  / release null-deref).
• After: stepping works; VS Call Stack / Parallel Stacks populate correctly.

steveisok and others added 2 commits June 26, 2026 11:52
A Frame's m_Next is NULL until the frame is fully pushed (initialized to
nullptr in the Frame constructor, linked into the chain only by Push()). A
stackwalk can observe a not-yet-fully-linked frame - for example the DAC
walking a thread suspended for debugger stepping over interpreter frames.

The walker relies on the explicit-frame chain being terminated by FRAME_TOP
and only ever checks against FRAME_TOP, so advancing onto a frame whose
m_Next is NULL left m_crawl.pFrame == NULL and caused a null dereference in
StackFrameIterator::CheckForSkippedFrames (and other deref sites).

Normalize a NULL next-frame to FRAME_TOP in GotoNextFrame so the chain is
always terminated and the rest of the walk never dereferences NULL.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
StackFrameIterator::Init and ResetRegDisp seed an interpreter stackwalk by reading
the owning InterpreterFrame from the first-argument register of pCurrentContext.
That is only a valid convention for a CONTEXT produced by a normal runtime stackwalk,
where the interpreter code manager (eetwain.cpp) synthesizes it. It does NOT hold for
a CONTEXT supplied by the debugger via ICorDebugStackWalk::SetContext:
DacDbiInterfaceImpl::SetStackWalkCurrentContext copies a raw thread CONTEXT into
pCurrentContext, so the first-arg register holds the interpreted method's working value
rather than the InterpreterFrame pointer (e.g. when VS GetActiveFunctions walks a thread
stopped at a breakpoint deep inside an interpreted method).

Reading the register then yields NULL/garbage, tripping the _ASSERTE in a checked DAC
(or null-dereferencing in release) inside ResetRegDisp. Only VS reproduces this because
its GetActiveFunctions/SetContext path seeds the walk with an arbitrary mid-method
context; VSCode does not.

Add GetOwningInterpreterFrame, which keeps the first-arg-register fast path but falls back
to the topmost InterpreterFrame on the thread's explicit-frame chain when the register does
not point at a valid InterpreterFrame. Also normalize a NULL PtrNextFrame() (m_Next of a
not-yet-fully-linked frame) to FRAME_TOP at both seed sites so the chain stays terminated.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dotnet-policy-service

Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @BrzVlad, @janvorli, @kg
See info in area-owners.md if you want to be subscribed.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates CoreCLR stack walking to be resilient when a stackwalk in interpreted code is seeded from a debugger-supplied CONTEXT (rather than one produced by the runtime’s own stackwalk), avoiding crashes/asserts when the “first-arg register contains owning InterpreterFrame*” convention doesn’t hold.

Changes:

  • Normalize a NULL explicit-frame m_Next to FRAME_TOP during CrawlFrame::GotoNextFrame() to keep frame-chain iteration terminating safely.
  • Add GetOwningInterpreterFrame(...) and use it from StackFrameIterator::Init and ResetRegDisp when starting from interpreted-code contexts.
  • Treat InterpreterFrame::PtrNextFrame() returning NULL as end-of-chain (FRAME_TOP) when skipping past the owning interpreter frame.

Comment on lines +1067 to +1083
PTR_InterpreterFrame pOwning =
dac_cast<PTR_InterpreterFrame>((TADDR)GetFirstArgReg(pContext));

if ((pOwning == NULL) || (pOwning->GetFrameIdentifier() != FrameIdentifier::InterpreterFrame))
{
pOwning = NULL;
for (PTR_Frame pFrame = pThread->GetFrame();
pFrame != FRAME_TOP;
pFrame = pFrame->PtrNextFrame())
{
if (pFrame->GetFrameIdentifier() == FrameIdentifier::InterpreterFrame)
{
pOwning = dac_cast<PTR_InterpreterFrame>(pFrame);
break;
}
}
}
// establishes a convention where the first-argument register of pCurrentContext holds the
// owning InterpreterFrame (see the InterpreterCodeManager in eetwain.cpp). That convention
// does NOT hold for a CONTEXT supplied by the debugger via ICorDebugStackWalk::SetContext:
// DacDbiInterfaceImpl::SetStackWalkCurrentContext copies a raw thread CONTEXT into

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This explanation doesn't make sense to me.

My mental model is that there are two kinds of CONTEXT, those where IP points at interpretted bytecode and those where it doesn't. If the IP points at interpreted bytecode then we'd also expect GetFirstArgReg(pContext) to point at the InterpreterFrame. Creating this method implies that invariant isn't holding, but if so I think we should either:

  1. Find where the invariant is getting broken and fix it. If the invariant doesn't hold here then the bug occured earlier, wherever the CONTEXT was generated.
  2. Define a new looser invariant with a clear understanding of why the previous simpler invariant can't be used.

The comment mentions the debugger provides the CONTEXT and treats it like a black-box. If the debugger provides a bad seed CONTEXT then the bug is likely in the debugger or in something the debugger read from. We'd need to peel those layers and find the root cause.

Deterministic, IDE-independent reproduction of the interpreter stackwalk seed crash
fixed in StackFrameIterator::Init / ResetRegDisp.

A managed target runs a method under the interpreter and calls Debugger.Break() inside
it. A minimal native ICorDebug client (dbgshim launch + managed callback) responds to the
Break event by calling ICorDebugThread2::GetActiveFunctions on the stopped thread, which
drives the V2 shim stackwalk (ShimStackWalk::Populate -> CordbStackWalk::SetContext ->
DacDbiInterfaceImpl::SetStackWalkCurrentContext -> StackFrameIterator::ResetRegDisp) with a
debugger-supplied CONTEXT in interpreted code.

For local debugging the DAC is loaded into the harness, so on a buggy Checked build the
harness asserts in ResetRegDisp (Release: null-deref), and on a fixed build
GetActiveFunctions returns cleanly and the harness exits 0. This exercises the same code
path that VS hits on the iOS simulator via the remote ICorDebug target.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants