[FIX]: Spsc queue false sharing#960
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughPTO2SpscQueue across both a2a3 and a5 runtime implementations is refactored to maintain separate cached buffer pointers and masks for producer and consumer execution paths, replacing the previous shared buffer_/mask_ fields to improve cache line isolation. ChangesSPSC Queue Per-Side Cache Isolation
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request splits the shared buffer and mask members of PTO2SpscQueue into separate producer (buffer_p_, mask_p_) and consumer (buffer_c_, mask_c_) local copies in both scheduler headers, and updates the pop_batch logic. However, these changes reduce the struct size from 256 to 192 bytes, which will cause static assertions regarding struct size and alignment to fail. To resolve this, 96 bytes of padding should be added to the end of the struct to maintain the 256-byte size.
| alignas(64) uint64_t head_cached_{0}; | ||
|
|
||
| // --- Shared (immutable after init) --- | ||
| PTO2TaskSlotState **buffer_{nullptr}; |
There was a problem hiding this comment.
why not use alignas(64) for buffer_? and static_assert(5*64 == sizeof(...));
There was a problem hiding this comment.
That was my initial thought. But I assumed the static_assert(4*64 == sizeof(...)); was there for a reason. Meaning there is some assumption somewhere that depends on that this is exactly 4 cache lines. In particular, because some of the memory management in the code base is manual, I could not guarantee that it does not affect something else.
If you tell me that 5 cache lines is fine, I am happy to change it.
The mask and data ptr of PTO2SpscQueue are used (read) by both producer and consumer, but lies on a consumer cache line where the consumer also writes to - > False Sharing!
FIX: Duplicated mask and data ptr such that producer and consumer both have a local copy.
(Duplicated rather than own cache line as there is an imposed size constraint of 4 cache lines.)
Additionally, a minor optimisation to pop_batch:
Cached head value gets updated when less elements than requested rather than if no elements.