Skip to content

JIT: form ldp/stp for adjacent Arm64 indexed accesses#129910

Draft
AndyAyersMS wants to merge 1 commit into
dotnet:mainfrom
AndyAyersMS:arm64-ldp-addrmode-93263
Draft

JIT: form ldp/stp for adjacent Arm64 indexed accesses#129910
AndyAyersMS wants to merge 1 commit into
dotnet:mainfrom
AndyAyersMS:arm64-ldp-addrmode-93263

Conversation

@AndyAyersMS

Copy link
Copy Markdown
Member

Arm64 can't encode [base, index, #off], so a base+index used at multiple offsets must be materialized anyway; re-enable CSE for it before pairing so adjacent loads/stores fold into ldp/stp.

Fixes #93263.

Arm64 can't encode [base, index, #off], so a base+index used at multiple
offsets must be materialized anyway; re-enable CSE for it before pairing so
adjacent loads/stores fold into ldp/stp.

Fixes dotnet#93263.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 26, 2026 20:18
@github-actions github-actions Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 26, 2026
@dotnet-policy-service

Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the CoreCLR Arm64 JIT CSE phase to selectively re-enable CSE for certain base + index address subexpressions so they can be materialized once and reused, enabling adjacent Arm64 indexed accesses to fold into ldp/stp pairs. It also adds a new JIT disasm-based regression test covering the expected pairing and address reuse patterns.

Changes:

  • Add an Arm64-only pre-pass in optOptimizeValnumCSEs to clear GTF_ADDRMODE_NO_CSE for profitable base + index address expressions.
  • Add a new JIT opt test that checks for ldp/stp formation and (heuristically) shared base registers for multiple-offset accesses.
  • Wire the new helper into the compiler interface (compiler.h).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
src/coreclr/jit/optcse.cpp Adds Arm64-only logic to re-enable CSE for certain base+index address expressions ahead of valnum CSE.
src/coreclr/jit/compiler.h Declares the new Arm64-only helper on Compiler.
src/tests/JIT/opt/AddrMode/LdpStpPairing.cs Adds disasm-based test cases for Arm64 ldp/stp pairing and shared-address reuse scenarios.
src/tests/JIT/opt/AddrMode/LdpStpPairing.csproj Adds the new test project with disasm checking enabled and required environment variables set.

Comment on lines +5791 to +5807
int nonZeroCount = 0;
bool adjacent = false;
for (int i = 0; i < offsets->Height(); i++)
{
if (offsets->Bottom(i) != 0)
{
nonZeroCount++;
}
for (int j = i + 1; pairable && !adjacent && (j < offsets->Height()); j++)
{
ssize_t delta = offsets->Bottom(i) - offsets->Bottom(j);
if ((delta == accessSize) || (delta == -accessSize))
{
adjacent = true;
}
}
}
Comment on lines +47 to +50
// Two non-adjacent, non-zero offsets cannot form a pair, but "src + i" / "dst + i"
// should still be materialized once and shared (rather than recomputed per access).
// The two loads sharing a base register proves the common "add" was CSE'd.
//ARM64-FULL-LINE: ldr {{q[0-9]+}}, [[[SRCBASE:x[0-9]+]], #0x10]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ARM64: Suboptimal codegen for addressing modes

2 participants