Skip to content

KI Sub-groups#668

Open
christiangnrd wants to merge 12 commits into
JuliaGPU:mainfrom
christiangnrd:subgroups
Open

KI Sub-groups#668
christiangnrd wants to merge 12 commits into
JuliaGPU:mainfrom
christiangnrd:subgroups

Conversation

@christiangnrd

@christiangnrd christiangnrd commented Dec 23, 2025

Copy link
Copy Markdown
Member

No description provided.

@github-actions

github-actions Bot commented Dec 23, 2025

Copy link
Copy Markdown
Contributor

Your PR no longer requires formatting changes. Thank you for your contribution!

@christiangnrd christiangnrd mentioned this pull request Jan 2, 2026
@christiangnrd christiangnrd marked this pull request as draft January 3, 2026 19:47
Comment thread test/interface.jl
@christiangnrd christiangnrd force-pushed the subgroups branch 7 times, most recently from daea025 to 6343fd2 Compare January 7, 2026 16:55
@codecov

codecov Bot commented Jan 7, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 75.55556% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.19%. Comparing base (9e79dd0) to head (be3a2e3).

Files with missing lines Patch % Lines
src/pocl/backend.jl 73.07% 7 Missing ⚠️
src/interface.jl 0.00% 3 Missing ⚠️
src/pocl/compiler/compilation.jl 88.88% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #668      +/-   ##
==========================================
+ Coverage   63.98%   64.19%   +0.20%     
==========================================
  Files          23       23              
  Lines        1966     2008      +42     
==========================================
+ Hits         1258     1289      +31     
- Misses        708      719      +11     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@christiangnrd christiangnrd force-pushed the subgroups branch 5 times, most recently from f0a545c to 8858825 Compare February 19, 2026 23:54
@christiangnrd christiangnrd marked this pull request as ready for review February 20, 2026 00:13
@christiangnrd christiangnrd force-pushed the subgroups branch 3 times, most recently from 7ddba7e to 8de2c42 Compare March 24, 2026 10:29
@JuliaGPU JuliaGPU deleted a comment from github-actions Bot Mar 24, 2026
@christiangnrd christiangnrd force-pushed the subgroups branch 2 times, most recently from 409a1e7 to 08a8130 Compare May 28, 2026 22:53
@christiangnrd

Copy link
Copy Markdown
Member Author

@maleadt @vchuravy This is ready for final review. If/once accepted I’ll merge the backend subgroup branches into their respective ka 0.10 branch and undo the new branch redirection before merging this one

@maleadt

maleadt commented Jun 17, 2026

Copy link
Copy Markdown
Member

I think the ideal solution would be to have get_num_sub_groups always return at least 1 so we can force a sub-group size.

FWIW, I fixed that, and pocl_jll should include the patch merged upstream to make get_num_sub_groups behave sensibly.

@christiangnrd

Copy link
Copy Markdown
Member Author

FWIW, I fixed that, and pocl_jll should include the patch merged upstream to make get_num_sub_groups behave sensibly.

I never got around to removing setting the subgroup size by default so no changes necessary.

Also, I bumped the required pocl_standalone_jll to 7.1.3, could/should we yank 7.1.3+0 to guarantee no segfaults from libpocl mismatches?

@christiangnrd christiangnrd added this to the 0.10.0 milestone Jun 18, 2026
shreyas-omkar added a commit to shreyas-omkar/KernelAbstractions.jl that referenced this pull request Jun 30, 2026
…overrides

Adds KI.sub_group_ballot(pred::Bool) → UInt32: returns a bitmask with bit
(lane-1) set for every lane where pred is true. CPU fallback returns UInt32(pred).
SPIR-V/OpenCL has no ballot intrinsic so no POCL override is added.

Also adds ext/CUDAExt.jl, the KA extension for CUDA, providing @device_override
implementations of all sub-group intrinsics from JuliaGPU#668:
  - sub_group_size(::CUDABackend), shfl_down_types(::CUDABackend)  [host-side]
  - get_sub_group_{size,max_size,local_id,id,num}()                [device-side]
  - sub_group_barrier(), shfl_down(), sub_group_ballot()            [device-side]

Builds on top of JuliaGPU#668 (KI Sub-groups).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@vchuravy vchuravy left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from my side, van all current backends implement this?

@christiangnrd

Copy link
Copy Markdown
Member Author

Looks good from my side, van all current backends implement this?

I have a branch that passes tests for every backend. I had to implement shuffle_down manually with shuffle for a couple of them.

I guess the newly supported cuda opencl is failing I’ll have to take a look.

@christiangnrd christiangnrd force-pushed the subgroups branch 3 times, most recently from b08bc66 to 2a9d804 Compare July 2, 2026 13:15
return config
end
@noinline function _compiler_config(dev; kernel = true, name = nothing, always_inline = false, kwargs...)
@noinline function _compiler_config(dev; kernel = true, name = nothing, always_inline = false, sub_group_size::Union{Nothing, Int} = 32, kwargs...)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try different defaults here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't. I just went with what is most common on the GPU side

@christiangnrd

christiangnrd commented Jul 2, 2026

Copy link
Copy Markdown
Member Author

@vchuravy I added a supports_subgroup function to KA that defaults to true since it only really needs to be defined in OpenCL.jl. If you agree with that approach I'll revert CI pointing to my intrinsicsnew branches and merge this!

Comment thread src/pocl/backend.jl
return Int(device().max_work_group_size)
end
function KI.sub_group_size(::POCLBackend)::Int
sg_sizes = cl.device().sub_group_sizes

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment explaining the logic here?

@christiangnrd christiangnrd Jul 3, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sg_sizes = cl.device().sub_group_sizes
# POCL can technically support any sub_group size.
# Check for common values used on GPUs then
# return 1 otherwise
sg_sizes = cl.device().sub_group_sizes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants