perf: reduce collector boundary test to a cached worst-bound compare#280
perf: reduce collector boundary test to a cached worst-bound compare#280Nelson Spence (Fieldnote-Echo) wants to merge 5 commits into
Conversation
Doc ids visit each per-query heap strictly ascending, so a candidate tying the worst kept hamming always loses the (hamming, doc_id) tie-break — once the collector is full, the accept test is exactly 'hamming < worst kept hamming'. Cache that bound in a register-friendly u32 (u32::MAX while filling) and skip the heap peek + tuple compare on the ~99.8% reject path. Bit-identical by construction; pinned by the tie-heavy and duplicate-run oracle suites.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Security-review note (fleet CIPHER-002): parallel finite validation introduced in the encode train transitively touches the global rayon pool from inside the 'serial' CSR primitive. The serial guarantee is about candidate scan/selection ownership, not input validation; say so explicitly.
|
/agentic_review |
Code Review by Qodo
1. Rayon doc mismatch
|
| /// "Serial" scopes the scan and selection: no rayon is entered for the | ||
| /// candidate work, so callers own that parallelism. Input finite- | ||
| /// validation MAY briefly use the global rayon pool for large query | ||
| /// buffers (order-independent boolean reduction; deterministic). |
There was a problem hiding this comment.
1. Rayon doc mismatch 🐞 Bug ⚙ Maintainability
top_m_candidates_batched_serial_csr docs claim input finite-validation may use the global rayon pool, but assert_all_finite is implemented as a plain sequential iterator check and the streamed candidate path shown does not enter rayon. This misstates the API’s concurrency contract and can mislead callers relying on the “serial/no rayon” guarantee.
Agent Prompt
## Issue description
The docstring for `top_m_candidates_batched_serial_csr` states that input finite-validation “MAY briefly use the global rayon pool,” but the current implementation does not use rayon in validation.
## Issue Context
- `top_m_candidates_batched_serial_csr` calls `crate::util::assert_all_finite(queries)` and then runs the streamed scan/selection without any rayon usage.
- `assert_all_finite` is currently implemented with `v.iter().all(...)` (sequential).
## Fix Focus Areas
- src/sign_bitmap.rs[414-417]
- src/util.rs[117-131]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
|
Qodo triage (rayon doc mismatch): accurate observation, intentional wording. The parallel finite-validation lands in #279 ( |
Understood — thanks for the context.
No code change needed on this PR. |
Summary
Doc ids visit each per-query heap strictly ascending (rows ascend within blocks, blocks ascend), so a candidate tying the worst kept hamming always loses the
(hamming, doc_id)tie-break — once full, the accept test is exactlyhamming < worst. Caches that bound per query (u32::MAX while filling) and skips the heap peek + tuple compare on the ~99.8% reject path.Bit-identical by construction; the tie-heavy and duplicate-run oracle suites pin it.
Measured (stacked with the ordinaldb stream-sharing floor, 1.26M × 1024)
Batch 7.7k → 9.7–10.2k q/s; single-query p50 3.7 → 3.2ms (ordinaldb's nq=1 routes through this path). Recall identical.
Stacked on #278 (ultracode sweep rank-1 finding, verified).