KI Sub-groups#668
Conversation
|
Your PR no longer requires formatting changes. Thank you for your contribution! |
6852410 to
84730d2
Compare
daea025 to
6343fd2
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #668 +/- ##
==========================================
+ Coverage 63.98% 64.19% +0.20%
==========================================
Files 23 23
Lines 1966 2008 +42
==========================================
+ Hits 1258 1289 +31
- Misses 708 719 +11 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
6343fd2 to
a86cc8c
Compare
a86cc8c to
6f4a517
Compare
f0a545c to
8858825
Compare
7ddba7e to
8de2c42
Compare
8de2c42 to
82d32db
Compare
409a1e7 to
08a8130
Compare
FWIW, I fixed that, and pocl_jll should include the patch merged upstream to make |
I never got around to removing setting the subgroup size by default so no changes necessary. Also, I bumped the required pocl_standalone_jll to 7.1.3, could/should we yank 7.1.3+0 to guarantee no segfaults from libpocl mismatches? |
…overrides Adds KI.sub_group_ballot(pred::Bool) → UInt32: returns a bitmask with bit (lane-1) set for every lane where pred is true. CPU fallback returns UInt32(pred). SPIR-V/OpenCL has no ballot intrinsic so no POCL override is added. Also adds ext/CUDAExt.jl, the KA extension for CUDA, providing @device_override implementations of all sub-group intrinsics from JuliaGPU#668: - sub_group_size(::CUDABackend), shfl_down_types(::CUDABackend) [host-side] - get_sub_group_{size,max_size,local_id,id,num}() [device-side] - sub_group_barrier(), shfl_down(), sub_group_ballot() [device-side] Builds on top of JuliaGPU#668 (KI Sub-groups). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
vchuravy
left a comment
There was a problem hiding this comment.
Looks good from my side, van all current backends implement this?
I have a branch that passes tests for every backend. I had to implement shuffle_down manually with shuffle for a couple of them. I guess the newly supported cuda opencl is failing I’ll have to take a look. |
b08bc66 to
2a9d804
Compare
| return config | ||
| end | ||
| @noinline function _compiler_config(dev; kernel = true, name = nothing, always_inline = false, kwargs...) | ||
| @noinline function _compiler_config(dev; kernel = true, name = nothing, always_inline = false, sub_group_size::Union{Nothing, Int} = 32, kwargs...) |
There was a problem hiding this comment.
Did you try different defaults here?
There was a problem hiding this comment.
I didn't. I just went with what is most common on the GPU side
Co-Authored-By: Anton Smirnov <tonysmn97@gmail.com>
|
@vchuravy I added a |
| return Int(device().max_work_group_size) | ||
| end | ||
| function KI.sub_group_size(::POCLBackend)::Int | ||
| sg_sizes = cl.device().sub_group_sizes |
There was a problem hiding this comment.
Can you add a comment explaining the logic here?
There was a problem hiding this comment.
| sg_sizes = cl.device().sub_group_sizes | |
| # POCL can technically support any sub_group size. | |
| # Check for common values used on GPUs then | |
| # return 1 otherwise | |
| sg_sizes = cl.device().sub_group_sizes |
No description provided.