JIT: form ldp/stp for adjacent Arm64 indexed accesses#129910
Draft
AndyAyersMS wants to merge 1 commit into
Draft
JIT: form ldp/stp for adjacent Arm64 indexed accesses#129910AndyAyersMS wants to merge 1 commit into
AndyAyersMS wants to merge 1 commit into
Conversation
Arm64 can't encode [base, index, #off], so a base+index used at multiple offsets must be materialized anyway; re-enable CSE for it before pairing so adjacent loads/stores fold into ldp/stp. Fixes dotnet#93263. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates the CoreCLR Arm64 JIT CSE phase to selectively re-enable CSE for certain base + index address subexpressions so they can be materialized once and reused, enabling adjacent Arm64 indexed accesses to fold into ldp/stp pairs. It also adds a new JIT disasm-based regression test covering the expected pairing and address reuse patterns.
Changes:
- Add an Arm64-only pre-pass in
optOptimizeValnumCSEsto clearGTF_ADDRMODE_NO_CSEfor profitablebase + indexaddress expressions. - Add a new JIT opt test that checks for
ldp/stpformation and (heuristically) shared base registers for multiple-offset accesses. - Wire the new helper into the compiler interface (
compiler.h).
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/coreclr/jit/optcse.cpp | Adds Arm64-only logic to re-enable CSE for certain base+index address expressions ahead of valnum CSE. |
| src/coreclr/jit/compiler.h | Declares the new Arm64-only helper on Compiler. |
| src/tests/JIT/opt/AddrMode/LdpStpPairing.cs | Adds disasm-based test cases for Arm64 ldp/stp pairing and shared-address reuse scenarios. |
| src/tests/JIT/opt/AddrMode/LdpStpPairing.csproj | Adds the new test project with disasm checking enabled and required environment variables set. |
Comment on lines
+5791
to
+5807
| int nonZeroCount = 0; | ||
| bool adjacent = false; | ||
| for (int i = 0; i < offsets->Height(); i++) | ||
| { | ||
| if (offsets->Bottom(i) != 0) | ||
| { | ||
| nonZeroCount++; | ||
| } | ||
| for (int j = i + 1; pairable && !adjacent && (j < offsets->Height()); j++) | ||
| { | ||
| ssize_t delta = offsets->Bottom(i) - offsets->Bottom(j); | ||
| if ((delta == accessSize) || (delta == -accessSize)) | ||
| { | ||
| adjacent = true; | ||
| } | ||
| } | ||
| } |
Comment on lines
+47
to
+50
| // Two non-adjacent, non-zero offsets cannot form a pair, but "src + i" / "dst + i" | ||
| // should still be materialized once and shared (rather than recomputed per access). | ||
| // The two loads sharing a base register proves the common "add" was CSE'd. | ||
| //ARM64-FULL-LINE: ldr {{q[0-9]+}}, [[[SRCBASE:x[0-9]+]], #0x10] |
Open
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Arm64 can't encode [base, index, #off], so a base+index used at multiple offsets must be materialized anyway; re-enable CSE for it before pairing so adjacent loads/stores fold into ldp/stp.
Fixes #93263.