Skip to content

perf: reduce collector boundary test to a cached worst-bound compare#280

Open
Nelson Spence (Fieldnote-Echo) wants to merge 5 commits into
perf/tiled-batch-candgenfrom
perf/collector-worst-bound
Open

perf: reduce collector boundary test to a cached worst-bound compare#280
Nelson Spence (Fieldnote-Echo) wants to merge 5 commits into
perf/tiled-batch-candgenfrom
perf/collector-worst-bound

Conversation

@Fieldnote-Echo

Copy link
Copy Markdown
Member

Summary

Doc ids visit each per-query heap strictly ascending (rows ascend within blocks, blocks ascend), so a candidate tying the worst kept hamming always loses the (hamming, doc_id) tie-break — once full, the accept test is exactly hamming < worst. Caches that bound per query (u32::MAX while filling) and skips the heap peek + tuple compare on the ~99.8% reject path.

Bit-identical by construction; the tie-heavy and duplicate-run oracle suites pin it.

Measured (stacked with the ordinaldb stream-sharing floor, 1.26M × 1024)

Batch 7.7k → 9.7–10.2k q/s; single-query p50 3.7 → 3.2ms (ordinaldb's nq=1 routes through this path). Recall identical.

Stacked on #278 (ultracode sweep rank-1 finding, verified).

Doc ids visit each per-query heap strictly ascending, so a candidate
tying the worst kept hamming always loses the (hamming, doc_id)
tie-break — once the collector is full, the accept test is exactly
'hamming < worst kept hamming'. Cache that bound in a register-friendly
u32 (u32::MAX while filling) and skip the heap peek + tuple compare on
the ~99.8% reject path. Bit-identical by construction; pinned by the
tie-heavy and duplicate-run oracle suites.
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@codecov

codecov Bot commented Jul 3, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Security-review note (fleet CIPHER-002): parallel finite validation
introduced in the encode train transitively touches the global rayon
pool from inside the 'serial' CSR primitive. The serial guarantee is
about candidate scan/selection ownership, not input validation; say
so explicitly.
@Fieldnote-Echo

Copy link
Copy Markdown
Member Author

/agentic_review

@qodo-code-review

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (1) 📘 Rule violations (0) 📜 Skill insights (0)

Grey Divider


Informational

1. Rayon doc mismatch 🐞 Bug ⚙ Maintainability
Description
top_m_candidates_batched_serial_csr docs claim input finite-validation may use the global rayon
pool, but assert_all_finite is implemented as a plain sequential iterator check and the streamed
candidate path shown does not enter rayon. This misstates the API’s concurrency contract and can
mislead callers relying on the “serial/no rayon” guarantee.
Code

src/sign_bitmap.rs[R414-417]

+    /// "Serial" scopes the scan and selection: no rayon is entered for the
+    /// candidate work, so callers own that parallelism. Input finite-
+    /// validation MAY briefly use the global rayon pool for large query
+    /// buffers (order-independent boolean reduction; deterministic).
Relevance

⭐⭐⭐ High

Team frequently accepts fixing misleading doc/API contracts (e.g., doc/behavior mismatches accepted
in PRs #12, #227, #40).

PR-#12
PR-#227
PR-#40

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
The new doc comment explicitly claims possible rayon usage during finite validation, but the only
validation call is assert_all_finite, which is implemented as a sequential iter().all(...)
check; the rest of the serial CSR path delegates to top_m_candidates_streamed, which also contains
no rayon calls.

src/sign_bitmap.rs[414-417]
src/sign_bitmap.rs[433-457]
src/util.rs[117-131]
src/sign_bitmap.rs[231-308]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The docstring for `top_m_candidates_batched_serial_csr` states that input finite-validation “MAY briefly use the global rayon pool,” but the current implementation does not use rayon in validation.

## Issue Context
- `top_m_candidates_batched_serial_csr` calls `crate::util::assert_all_finite(queries)` and then runs the streamed scan/selection without any rayon usage.
- `assert_all_finite` is currently implemented with `v.iter().all(...)` (sequential).

## Fix Focus Areas
- src/sign_bitmap.rs[414-417]
- src/util.rs[117-131]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

Comment thread src/sign_bitmap.rs
Comment on lines +414 to +417
/// "Serial" scopes the scan and selection: no rayon is entered for the
/// candidate work, so callers own that parallelism. Input finite-
/// validation MAY briefly use the global rayon pool for large query
/// buffers (order-independent boolean reduction; deterministic).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Informational

1. Rayon doc mismatch 🐞 Bug ⚙ Maintainability

top_m_candidates_batched_serial_csr docs claim input finite-validation may use the global rayon
pool, but assert_all_finite is implemented as a plain sequential iterator check and the streamed
candidate path shown does not enter rayon. This misstates the API’s concurrency contract and can
mislead callers relying on the “serial/no rayon” guarantee.
Agent Prompt
## Issue description
The docstring for `top_m_candidates_batched_serial_csr` states that input finite-validation “MAY briefly use the global rayon pool,” but the current implementation does not use rayon in validation.

## Issue Context
- `top_m_candidates_batched_serial_csr` calls `crate::util::assert_all_finite(queries)` and then runs the streamed scan/selection without any rayon usage.
- `assert_all_finite` is currently implemented with `v.iter().all(...)` (sequential).

## Fix Focus Areas
- src/sign_bitmap.rs[414-417]
- src/util.rs[117-131]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

@Fieldnote-Echo

Copy link
Copy Markdown
Member Author

Qodo triage (rayon doc mismatch): accurate observation, intentional wording. The parallel finite-validation lands in #279 (perf/encode-path), a sibling PR in the same 0.6.0 train — on this branch's ancestry validation is indeed still sequential. The doc says validation may briefly use the global pool: a permissive contract that is correct both before and after #279 merges, chosen so the serial scan/selection guarantee (the part callers rely on) is stated exactly once and stays true across the train. No code change.

@qodo-code-review

Copy link
Copy Markdown

triage (rayon doc mismatch): accurate observation, intentional wording. The parallel finite-validation lands in #279 (perf/encode-path), a sibling PR in the same 0.6.0 train — on this branch's ancestry validation is indeed still sequential. The doc says validation may briefly use the global pool: a permissive contract that is correct both before and after #279 merges, chosen so the serial scan/selection guarantee (the part callers rely on) is stated exactly once and stays true across the train. No code change.

Understood — thanks for the context.

finding 1 looks intentionally worded:

  • the serial scan/selection guarantee stays exact on this branch
  • the validation wording is permissive enough to remain correct after #279 lands

No code change needed on this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants