Skip to content

fix: derive artifact read bounds from declared sizes in ordvec-manifest#277

Closed
Nelson Spence (Fieldnote-Echo) wants to merge 12 commits into
mainfrom
fix/manifest-derived-limits
Closed

fix: derive artifact read bounds from declared sizes in ordvec-manifest#277
Nelson Spence (Fieldnote-Echo) wants to merge 12 commits into
mainfrom
fix/manifest-derived-limits

Conversation

@Fieldnote-Echo

@Fieldnote-Echo Fieldnote-Echo commented Jul 3, 2026

Copy link
Copy Markdown
Member

Summary

  • Verify path bounds every artifact read by the manifest-declared file_size_bytes; the manifest itself stays hard-capped at 1 MiB and SHA-256 pins content. Create path bounds reads by the artifact's observed size.
  • Flat ResourceLimits byte caps (max_auxiliary_artifact_bytes, max_calibration_profile_bytes, max_encoder_distortion_profile_bytes) become opt-in ceilings, default unbounded. Explicitly configured caps behave exactly as before.
  • The primary artifact read was previously unbounded; it now gets a declared-size bound and a new artifact_file_too_large reason code (fail-fast on grown artifacts instead of digest-mismatch after hashing the excess).
  • sha256_file_bounded streams with a 64 KiB buffer — constant memory at any artifact size (previously materialised the whole file).

Why

The 64 MiB auxiliary default made legitimate sign sidecars impossible to persist past 524,288 rows at dim=1024 (sign.ovsb = rows × dim/8). Measured on a 1,258,135-row × 1024-dim corpus: write_verified_bundle failed with default options. A security bound meant for hostile foreign input was applied to self-written artifacts.

Hostile-input posture (what precisely changed)

Memory safety and primary-artifact bounding improved; default verification time/IO on hostile bundles is now deployment-bounded via the opt-in ResourceLimits ceilings (the old flat 64 MiB default also bounded attacker-supplied I/O — that bound is now a knob, documented in THREAT-QUERY-003). Note: file_size_bytes is a required manifest field — v1 manifests without it fail deserialization; there is no fallback path.

  • Manifest parse bound unchanged (1 MiB).
  • A hostile manifest cannot cause unbounded memory (streaming hash); I/O+CPU remain proportional to bytes actually supplied — deployments that must bound worst-case verification time set the explicit ceilings.
  • Inflated declared size with unchanged bytes → auxiliary_artifact_file_size_mismatch; grown artifact → fail-fast *_file_too_large; truncation → size mismatch. All covered by new tests.
  • THREAT_MODEL.md gains THREAT-QUERY-003 documenting the derived-bound model and reiterating that VerifiedLoadPlan is a snapshot, not a byte pin.

Tests

  • New tests/derived_limits.rs: >64 MiB roundtrip at pure defaults, grown/truncated/inflated-declaration cases, explicit-cap back-compat, primary-artifact bound.
  • Two existing tests updated to the fail-fast contract (append-growth → artifact_file_too_large); corruption coverage preserved by switching that test to in-place corruption.
  • Full gate: workspace tests, no-default-features, --locked, -D warnings, MSRV 1.89, fuzz build — all green. 63/63 ordvec-manifest tests.

Part of the 1M-row release train (Track A1 of the locked master plan).

Verification now bounds every artifact read by its manifest-declared
file_size_bytes (manifest hard-capped at 1 MiB; SHA-256 pins content);
creation bounds reads by the observed file size. Flat ResourceLimits
byte caps become opt-in ceilings (default unbounded). The primary
artifact read, previously unbounded, gains a declared-size bound and
the artifact_file_too_large reason code. sha256_file_bounded now
streams with constant memory instead of materialising files.

Fixes the undocumented 64 MiB auxiliary cap that made sign-sidecar
bundles impossible to write past 524,288 rows at dim=1024 (measured
on a 1,258,135-row corpus).
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@Fieldnote-Echo

Copy link
Copy Markdown
Member Author

Codex (@codex) review

@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@qodo-code-review

Copy link
Copy Markdown

PR Summary by Qodo

Derive artifact read bounds from manifest-declared sizes in ordvec-manifest

🐞 Bug fix ✨ Enhancement 🧪 Tests 📝 Documentation 🕐 40+ Minutes

Grey Divider

AI Description

• Bound verification reads by manifest-declared file sizes; add fail-fast oversized error codes.
• Make ResourceLimits byte caps opt-in ceilings (defaults unbounded) to allow large sidecars.
• Stream bounded SHA-256 hashing with constant memory; add regression tests and docs updates.
Diagram

graph TD
  C["Create manifest"] --> S["Observe file size"] --> H["sha256_file_bounded"] --> M[("Manifest (size+sha256)")]
  V["Verify bundle"] --> M --> B{"Compute read bound"} --> H --> R["VerificationReport"]
  L["ResourceLimits caps"] -->|"optional ceiling"| B
  subgraph Legend
    direction LR
    _op["Operation"] ~~~ _dec{"Decision"} ~~~ _data[("Data")]
  end
Loading
High-Level Assessment

The following are alternative approaches to this PR:

1. Raise the default flat caps (keep flat-bound model)
  • ➕ Simpler mental model: one max-bytes knob per artifact type
  • ➕ Still bounds worst-case verification I/O/CPU by default
  • ➖ Still risks rejecting legitimate artifacts as datasets grow
  • ➖ Choosing a safe-yet-non-breaking default is hard and likely to regress again
2. Two-tier policy: strict defaults for verify, unbounded defaults for create
  • ➕ Keeps hostile-input posture bounded by default while unblocking self-written bundles
  • ➕ More explicit separation of trusted/untrusted paths
  • ➖ Surprising inconsistency between create and verify defaults
  • ➖ More configuration/documentation complexity; can still break common workflows

Recommendation: The PR’s derived-bound strategy is the best tradeoff: it makes verification proportional to the declared artifact sizes (anchored by a small, pinned manifest) and preserves a knob (ResourceLimits) for deployments that need explicit worst-case caps. The streaming hash implementation also removes an important memory-risk footgun without weakening integrity checks.

Files changed (6) +303 / -24

Bug fix (1) +57 / -11
lib.rsDerive per-artifact hash bounds; stream sha256_file_bounded; add new reason code +57/-11

Derive per-artifact hash bounds; stream sha256_file_bounded; add new reason code

• Changes default ResourceLimits byte caps to u64::MAX (opt-in ceilings) and updates verification to bound reads by manifest-declared sizes, applying min(declared, configured_cap) for auxiliary/profile artifacts. Adds a declared-size bound for the primary artifact with a new artifact_file_too_large code, and rewrites sha256_file_bounded to stream with a 64 KiB buffer instead of materializing the whole file.

ordvec-manifest/src/lib.rs

Tests (2) +205 / -9
derived_limits.rsAdd tests for derived bounds, oversized behavior, and cap back-compat +197/-0

Add tests for derived bounds, oversized behavior, and cap back-compat

• Adds end-to-end coverage proving default options accept auxiliary artifacts larger than the legacy 64 MiB cap, and verifies correct failure modes for grown, truncated, and inflated-declaration artifacts. Also asserts explicit caps remain enforced and the primary artifact now fails fast when grown past its declared size.

ordvec-manifest/tests/derived_limits.rs

manifest.rsUpdate existing tests for new fail-fast oversized behavior +8/-9

Update existing tests for new fail-fast oversized behavior

• Reworks the corruption test to flip a byte in-place (same size) so it still exercises SHA-256 mismatch after passing the declared-size bound. Updates the VerifiedLoadPlan re-verification test to expect artifact_file_too_large when the artifact grows past its declared size.

ordvec-manifest/tests/manifest.rs

Documentation (3) +41 / -4
CHANGELOG.mdDocument derived artifact bounds and streaming hashing change +19/-1

Document derived artifact bounds and streaming hashing change

• Replaces the empty Unreleased section with entries describing derived size bounds, newly bounded primary artifact reads, and constant-memory hashing. Calls out the behavioral change from default 64 MiB auxiliary caps to unbounded-by-default opt-in ceilings.

CHANGELOG.md

THREAT_MODEL.mdAdd threat-model entry for derived artifact read bounds +13/-0

Add threat-model entry for derived artifact read bounds

• Introduces THREAT-QUERY-003 documenting the new derived-bound model, noting constant-memory hashing and that I/O/CPU can still scale with attacker-supplied bytes unless explicit caps are configured. Reiterates that VerifiedLoadPlan is a snapshot rather than a byte pin.

THREAT_MODEL.md

README.mdUpdate limit-code docs to reflect derived bounds and new primary bound +9/-3

Update limit-code docs to reflect derived bounds and new primary bound

• Updates the stable limit-code section to describe per-artifact bounds derived from manifest-declared sizes (verify) or observed sizes (create). Adds the new primary index artifact bound and clarifies that flat ResourceLimits caps are opt-in ceilings.

ordvec-manifest/README.md

@codecov

codecov Bot commented Jul 3, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@qodo-code-review

qodo-code-review Bot commented Jul 3, 2026

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0) 🎨 UX issues (0) 🔗 Cross-repo conflicts (0) 📜 Skill insights (0)

Grey Divider


Remediation recommended

1. Primary create hash unbounded ✓ Resolved 🐞 Bug ≡ Correctness
Description
Verification now hashes the primary index artifact with `sha256_file_bounded(...,
manifest.artifact.file_size_bytes, ...)`, but manifest creation still hashes the primary artifact
with sha256_file() and separately records file_size_bytes from probed metadata. If the index
file grows while the manifest is being created, the manifest can end up with a digest computed over
more bytes than verification will ever read under the declared-size bound.
Code

ordvec-manifest/src/lib.rs[R260-268]

+        // Bound the read by the manifest-declared size: a primary artifact
+        // larger than its declaration fails fast instead of being hashed in
+        // full (the read was previously unbounded).
+        match sha256_file_bounded(
+            &resolved.canonical_path,
+            document.manifest.artifact.file_size_bytes,
+            "artifact_file_too_large",
+            "index artifact",
+        ) {
Relevance

⭐⭐⭐ High

Team frequently accepts hashing/TOCTOU hardening; prior work replaced unbounded sha256_file with
bounded patterns in verifier.

PR-#157
PR-#163
PR-#152

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
The verification path now uses a declared-size bounded hash for the primary artifact, but the
creation path still computes the primary artifact hash with sha256_file() while separately
recording file_size_bytes from metadata; this leaves a race window where digest and declared size
can diverge if the file changes during create.

/ordvec-manifest/src/lib.rs[241-296]
/ordvec-manifest/src/lib.rs[3546-3572]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Primary artifact verification is now declared-size bounded, but primary artifact *creation* still hashes via `sha256_file()` without bounding to a stable observed length and without asserting `hash.size_bytes == metadata.file_size_bytes`. This can produce a manifest whose `sha256` and `file_size_bytes` describe different snapshots if the index is concurrently modified during manifest creation.

## Issue Context
Auxiliary artifact creation was updated to bound reads by observed size; the primary artifact creation path was not updated similarly.

## Fix Focus Areas
- ordvec-manifest/src/lib.rs[241-296]
- ordvec-manifest/src/lib.rs[3546-3572]

## Suggested fix
- In `create_manifest_for_index_with_options`, compute an `observed_len` for the primary artifact (use `metadata.file_size_bytes` or `fs::metadata(index_path)?.len()`), then hash with `sha256_file_bounded(index_path, observed_len, "artifact_file_too_large", "index artifact")`.
- After hashing, if `hash.size_bytes != observed_len`, return a creation-time error indicating the artifact changed during hashing.
- Populate `artifact.sha256` and `artifact.file_size_bytes` from the bounded hash result to keep them consistent with the bytes that were actually hashed.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Interrupted read not retried ✓ Resolved 🐞 Bug ☼ Reliability
Description
sha256_file_bounded() now uses a manual read() loop but does not retry on
ErrorKind::Interrupted, so hashing can fail spuriously when syscalls are interrupted by signals.
This is a regression relative to read_to_end()-style loops that transparently retry interrupts.
Code

ordvec-manifest/src/lib.rs[R3477-3497]

+    let mut file = File::open(path)?;
    let mut hasher = Sha256::new();
-    hasher.update(&bytes);
+    let mut size_bytes = 0u64;
+    let mut buf = [0u8; 64 * 1024];
+    loop {
+        let n = file.read(&mut buf)?;
+        if n == 0 {
+            break;
+        }
+        size_bytes += n as u64;
+        if size_bytes > max_bytes {
+            return Err(ManifestError::limit_exceeded(
+                code,
+                format!(
+                    "{context} exceeds {max_bytes} bytes while reading {}",
+                    path.display()
+                ),
+            ));
+        }
+        hasher.update(&buf[..n]);
+    }
Relevance

⭐⭐ Medium

No direct prior review precedent on EINTR retries in Rust read loops; team generally accepts
reliability hardening though.

PR-#203
PR-#157

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
The new streaming loop calls file.read(&mut buf)? directly; there is no branch to handle/retry
Interrupted, so an EINTR will bubble up as a verification failure.

/ordvec-manifest/src/lib.rs[3470-3502]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`sha256_file_bounded()` reads in a loop with `file.read(&mut buf)?` and will return an error on `io::ErrorKind::Interrupted` instead of retrying. This can cause rare but real spurious failures in verification/creation.

## Issue Context
The PR replaced the previous bounded hashing implementation with streaming reads.

## Fix Focus Areas
- ordvec-manifest/src/lib.rs[3451-3502]

## Suggested fix
- Change the read loop to explicitly `continue` on `Err(e)` where `e.kind() == io::ErrorKind::Interrupted`.
- (Optional but recommended) apply the same EINTR retry behavior to `sha256_file()` for consistency.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

Comment thread ordvec-manifest/src/lib.rs
Comment thread ordvec-manifest/src/lib.rs
Security-audit remediation (CIPHER-02): the primary artifact read was
bounded only by the attacker-declared size with no configurable
ceiling, unlike the auxiliary/profile classes — the documented
THREAT-QUERY-003 mitigation was silently incomplete for the largest
artifact in a bundle. Adds ResourceLimits::max_index_artifact_bytes
(default unbounded) applied as a min() with the declared size, and
bounds the create-path primary read by its observed size so the code
matches the CHANGELOG claim. CLI flag parity follows separately.
Code-review remediation: the sqlite-feature cache-key path duplicated
the hashing logic and missed the derived-bound change — the primary
artifact hash was fully unbounded, and the calibration/encoder
profile hashes were bounded only by the flat caps, which the default
flip turned into effectively unbounded reads. All three now use the
same declared-size .min(opt-in ceiling) derivation as the verify
path; a bound violation is a cache miss. Adds default-limits
grown-profile coverage for the calibration and encoder-distortion
call sites, closing the per-site test gap that let this slip.
@Fieldnote-Echo

Copy link
Copy Markdown
Member Author

Adversarial audit triage (cipher security agent + code-reviewer agent)

Fixed in 41c3c03:

  • CIPHER-02 (Low, most actionable): primary index artifact had no opt-in ceiling — added ResourceLimits::max_index_artifact_bytes (default unbounded) applied as .min() with the declared size; create-path primary read now bounded by observed size (also closes the reviewer's CHANGELOG-overclaim MEDIUM).

Fixed in 763dffc:

  • Reviewer HIGH: sqlite-feature cache-key path duplicated hashing and missed the derived-bound change — primary hash was fully unbounded; calibration/encoder profile hashes were flat-cap-only, which the default flip made effectively unbounded. All three now use declared-size .min(ceiling); bound violation = cache miss.
  • Reviewer HIGH (coverage): added default-limits grown-profile tests for the calibration and encoder-distortion call sites (the missing per-site coverage that let the sqlite gap slip). Gate now runs --features sqlite: 78 tests green.

Explained / accepted:

  • CIPHER-01 (Low): default posture allows verification I/O proportional to bytes the attacker actually ships — documented tradeoff in THREAT-QUERY-003; opt-in ceilings are the mitigation and now cover every artifact class including the primary.
  • CIPHER-03 (Info): growth now surfaces as *_file_too_large (fail-fast), *_file_size_mismatch = truncation only — noted in CHANGELOG.

Deferred to follow-up issues:

  • CIPHER-04 (Info): zero-declared-size reason-code symmetry for primary/aux (*_file_size_zero).
  • CLI flag parity for --max-index-artifact-bytes.

Security-review remediation (fleet CIPHER-001): with derived read
bounds, a FIFO inside the bundle directory would block verification
forever — File::open on a reader-less FIFO blocks, and a device node
would stream without EOF under a large declared size. Stat the path
before opening and refuse anything that is not a regular file, at
every hashing call site (create, verify, sqlite cache). Regular files
terminate at EOF and remain post-checked against the declaration.
Unix regression test uses a real FIFO.
Bot-review remediation (Qodo, PR #277): the streaming loop surfaced
ErrorKind::Interrupted as a hash failure, a reliability regression vs
the previous read_to_end which retried EINTR internally.
@Fieldnote-Echo

Copy link
Copy Markdown
Member Author

Qodo triage: (1) EINTR retry in the streaming hash — fixed in b18ebaf. (2) Create-path primary hashed unbounded — already fixed in 41c3c03 (bounded by observed size), which post-dates the review snapshot; no further change.

@qodo-code-review

Copy link
Copy Markdown

triage: (1) EINTR retry in the streaming hash — fixed in b18ebaf. (2) Create-path primary hashed unbounded — already fixed in 41c3c03 (bounded by observed size), which post-dates the review snapshot; no further change.

That triage looks correct.

  • finding 1 is already marked implemented, and the snapshot reflects the later fix in 41c3c03.
  • finding 2 is also already implemented, with the EINTR retry covered by b18ebaf.

So there’s nothing further to change in this PR snapshot.

CI caught both halves: the retry from the Qodo remediation had landed
on sha256_file (where the io::Error conversion is a useless-conversion
lint under current clippy) instead of sha256_file_bounded, the
function actually flagged — the retry now lives on both, with the
conversion only where the error type changes. And the release-publish
invariant correctly refuses [Unreleased] changelog content at an
already-released version: this stack is the 0.6.0 work, so ordvec and
ordvec-manifest now say 0.6.0 (minor: limit-semantics change, additive
APIs, behavioral perf changes).
The release-publish SBOM invariant requires member package versions in
lockstep with the root; ordvec-ffi and both python bindings follow the
0.6.0 bump.
Closes the remaining release-publish invariant layers, verified by
running tests/release_publish_invariants.py locally to a clean exit:
pyproject + __init__ versions in lockstep, the changelog cut as a
dated 0.6.0 section (invariant convention: the current version always
has a dated section; [Unreleased] stays empty), THREAT_MODEL status
line at v0.6.0, and the README quickstart installing 0.6.
…imary shape check

Bot-review remediation (Qodo, #283 inline):
- create_manifest_for_index_with_options observed the index size twice
  (probe, then a separate stat for the hash bound) — a concurrent
  writer could produce a manifest whose size and digest describe
  different bytes. The hash is now bounded by the probe's size, the
  manifest records the byte count actually hashed, and any
  disagreement fails loudly.
- sha256_file_bounded could read (not hash) up to one 64KiB chunk past
  the bound; reads now clamp to max_bytes + 1, mirroring
  read_bounded_file's take() pattern.
- validate_manifest_shape gains artifact_file_size_zero for the
  primary artifact, matching the profile artifacts' explicit zero
  rejection instead of surfacing a confusing artifact_file_too_large.
Bot-review remediation (Qodo, #282): --max-index-artifact-bytes wired
into ResourceLimits but the create path bounded the primary hash by
the probed size alone — the opt-in ceiling was ineffective for
create, unlike auxiliary artifacts. Create now mirrors verify:
declared/observed size min explicit ceiling.
Navi Bot (project-navi-bot) pushed a commit that referenced this pull request Jul 4, 2026
* fix: derive artifact read bounds from declared sizes in ordvec-manifest

Verification now bounds every artifact read by its manifest-declared
file_size_bytes (manifest hard-capped at 1 MiB; SHA-256 pins content);
creation bounds reads by the observed file size. Flat ResourceLimits
byte caps become opt-in ceilings (default unbounded). The primary
artifact read, previously unbounded, gains a declared-size bound and
the artifact_file_too_large reason code. sha256_file_bounded now
streams with constant memory instead of materialising files.

Fixes the undocumented 64 MiB auxiliary cap that made sign-sidecar
bundles impossible to write past 524,288 rows at dim=1024 (measured
on a 1,258,135-row corpus).

* test: pin sign candidate-generation contract ahead of tiled internals

Independent oracle (score_all + full lexicographic sort by hamming asc,
doc_id asc) pins top_m_candidates and top_m_candidates_batched_serial_csr
exactly: random corpora across block boundaries, massive-tie and
duplicate-run corpora exercising boundary tie-breaks, edge geometries
(m >= n, single doc, empty batch), and the dim=1024 shape. Must pass
bit-identically before and after the tiling swap.

* fix: give the primary index artifact an opt-in read ceiling

Security-audit remediation (CIPHER-02): the primary artifact read was
bounded only by the attacker-declared size with no configurable
ceiling, unlike the auxiliary/profile classes — the documented
THREAT-QUERY-003 mitigation was silently incomplete for the largest
artifact in a bundle. Adds ResourceLimits::max_index_artifact_bytes
(default unbounded) applied as a min() with the declared size, and
bounds the create-path primary read by its observed size so the code
matches the CHANGELOG claim. CLI flag parity follows separately.

* fix: bound sqlite cache-key hashes by declared sizes

Code-review remediation: the sqlite-feature cache-key path duplicated
the hashing logic and missed the derived-bound change — the primary
artifact hash was fully unbounded, and the calibration/encoder
profile hashes were bounded only by the flat caps, which the default
flip turned into effectively unbounded reads. All three now use the
same declared-size .min(opt-in ceiling) derivation as the verify
path; a bound violation is a cache miss. Adds default-limits
grown-profile coverage for the calibration and encoder-distortion
call sites, closing the per-site test gap that let this slip.

* perf: stream the corpus once per call in sign candidate generation

top_m_candidates_batched_serial_csr previously looped the single-query
path, re-streaming the full sign bitmap per query (documented-naive
Track-1). The internals now scan the corpus once per call in L2-sized
doc blocks, score every query of the call against each hot block in
query tiles via the existing batched kernel, and select per-query
top-m with bounded (hamming, doc_id) min-collectors — bit-identical
to a full sort by construction, independent of processing order (the
key IS the contract's sort key). top_m_candidates routes through the
same core, dropping its per-call n-row Hamming materialisation.

Per-query corpus traffic drops by the call's query count: at 1.26M
rows x 1024 dims, a 2048-query call reads the 161MB sidecar once
instead of 2048 times. Serial contract preserved (no rayon); the
oracle suite (tests/tiled_candgen.rs) pins bit-identical outputs
across random, tie-heavy, duplicate-run, and edge geometries.

* perf: keep the dense partition path for single-query candidates

Audit remediation: routing top_m_candidates through the streamed core
measured +50-90% at small/medium n with m in the hundreds (bounded
heap O(n log m) vs select_nth_unstable_by O(n)); with one query there
is no scan to share, so nq=1 stays on the dense path (parity-or-better
at every measured size). Also per audit: the block-boundary oracle
test now genuinely spans three blocks (the dim=128 shape fit one
block), and adds the dim=768 AVX-512 tail-residue x multi-block case
to the permanent suite.

* perf: parallel finite validation and scratch-based rank encode

assert_all_finite paid a full serial pass per add/search batch —
measured ~0.1s per GiB, twice per ingest batch counting the caller
layer. Scans of 1M+ floats now split across the rayon pool (4.4x
measured). RankQuant::add's per-row closure allocated a fresh ranks
Vec per vector inside the parallel loop; for_each_init now reuses a
per-worker scratch via rank_transform_into. Measured on the 1.26M x
1024 corpus slice: encode-path attribution 0.097s serial scan ->
0.022s parallel; alloc churn removed from the hot loop.

* perf: reduce collector boundary test to a cached worst-bound compare

Doc ids visit each per-query heap strictly ascending, so a candidate
tying the worst kept hamming always loses the (hamming, doc_id)
tie-break — once the collector is full, the accept test is exactly
'hamming < worst kept hamming'. Cache that bound in a register-friendly
u32 (u32::MAX while filling) and skip the heap peek + tuple compare on
the ~99.8% reject path. Bit-identical by construction; pinned by the
tie-heavy and duplicate-run oracle suites.

* perf: LUT + parallel constant-composition check on RankQuant load

load_rankquant's forged-buffer defense histogrammed every packed code
serially — 1.29 billion shift/mask ops at 1.26M x 1024, ~1s of the
1.27s verified open. A 4KB per-byte bucket-count LUT replaces the
per-code inner loop and rows validate in parallel; find_first keeps
the lowest-offending-row error contract, with a scalar recheck
producing the identical message. The security property is unchanged:
every row still proves uniform composition before the index is
usable.

* docs: changelog perf entries and 0.6.0 downstream un-patch checklist

CHANGELOG Unreleased gains the measured perf work merged to
integration/full-stack: tiled streaming sign candidate generation +
cached collector worst-bound (bit-identical internals swap; downstream
batched search 220 -> 10.2k q/s at 1.26M x 1024), parallel finite
validation + scratch rank encode (0.097s -> 0.022s attribution), and
the LUT + parallel constant-composition load check (verified open
1.27s -> 0.38s). RELEASING gains a one-time pre-publish item: remove
OrdinalDB's [patch.crates-io] block pointing at integration/full-stack
when 0.6.0 publishes.

* feat: index-ceiling CLI parity and zero-size shape checks (CIPHER-04)

Expose --max-index-artifact-bytes on the ordvec-manifest CLI LimitArgs,
wiring it to ResourceLimits::max_index_artifact_bytes so the opt-in
primary-artifact read ceiling reaches feature parity with the existing
--max-auxiliary-artifact-bytes flag.

Close the deferred CIPHER-04 reason-code symmetry: validate_manifest_shape
now rejects a zero manifest-declared artifact.file_size_bytes
(artifact_file_size_zero) and validate_auxiliary_artifact_shape rejects
zero-size declarations on required auxiliary artifacts
(auxiliary_artifact_file_size_zero), mirroring the calibration and
encoder-distortion *_file_size_zero checks. Optional artifacts keep the
established zero-size absent-placeholder convention.

* fix: refuse non-regular artifact files before hashing

Security-review remediation (fleet CIPHER-001): with derived read
bounds, a FIFO inside the bundle directory would block verification
forever — File::open on a reader-less FIFO blocks, and a device node
would stream without EOF under a large declared size. Stat the path
before opening and refuse anything that is not a regular file, at
every hashing call site (create, verify, sqlite cache). Regular files
terminate at EOF and remain post-checked against the declaration.
Unix regression test uses a real FIFO.

* docs: scope the serial CSR contract to scan and selection

Security-review note (fleet CIPHER-002): parallel finite validation
introduced in the encode train transitively touches the global rayon
pool from inside the 'serial' CSR primitive. The serial guarantee is
about candidate scan/selection ownership, not input validation; say
so explicitly.

* fix: retry interrupted reads in bounded streaming hash

Bot-review remediation (Qodo, PR #277): the streaming loop surfaced
ErrorKind::Interrupted as a hash failure, a reliability regression vs
the previous read_to_end which retried EINTR internally.

* fix: assert whole-row query buffers in the streamed core

Bot-review remediation (Qodo, PR #278): the shared core derived nq by
integer division; a ragged buffer from a future caller would silently
truncate. All current callers validate upstream — this is the cheap
in-core invariant.

* perf: transpose-tree horizontal reduction in the batched sign kernel

The AVX-512 batched scan paid eight serial _mm512_reduce_add_epi64
expansions per doc-chunk — roughly a third of per-doc cycles at
dim=1024 (2 lanes) going to reduction rather than XOR+POPCNT work. An
unpack/permute/shuffle tree folds all eight accumulators into one
vector of sums (~25 ops replacing ~50), stored via one stack spill.
Tail path (batch % 8) keeps the per-accumulator reduce. Bit-identical:
pinned by the AVX-512-vs-scalar parity tests and the oracle suites.

* docs: scope the serial-contract claim in the tiled candgen entry

External-audit remediation: the entry claimed 'no rayon' unqualified;
finite validation on large buffers may use the global pool (documented
on the method), and top_m_candidates_batched is explicitly out of
scope of the internals swap.

* bench: regenerate committed synthetic results at the 0.6.0 heads

two_stage_caller_owned_dim1024: stage-1 candidate generation 159.60 ->
94.60 us/query (1.69x), full two-stage 172.42 -> 103.75 us/query
(1.66x) — same command, host, core pinning, and toolchain family;
verified code-only by an A/B against main on the same day/machine
(main reproduced the old numbers within 3%).

rank_modes: single-query latency rows are intentionally unchanged by
the batch rework (verified identical-within-noise main vs heads) and
carry a refresh note saying so; encode columns reflect the parallel
validation + scratch encode work. Quality columns bit-identical
throughout.

* docs: refresh README benchmarks at the 0.6.0 heads

All figures and numbers regenerated by the committed make benchmark-beir
pipeline on the same host class (9950X). Quality: nDCG within bootstrap
noise of exact on both datasets, sign-rq2 trec-covid 0.7638 unchanged
(deterministic selection held bit-identical through the perf train).
Single-query hero effectively unchanged (52.4 ms flat vs 0.52 ms
sign-rq2, ~101x) — that lane was intentionally untouched. Batched
1-thread view improves to ~10-12x over batched flat (once-per-call
corpus streaming); threaded view: HNSW still leads, margin narrowed
from ~2.3x to 1.6x over sign-rq2 (1.2x over bitmap-rq2). Build 47.1s
vs 0.21s. No larger-corpus claims added.

* docs: transcribe the refreshed hnsw nDCG in the tradeoff table

* fix: land the EINTR retry on the bounded hash and bump to 0.6.0

CI caught both halves: the retry from the Qodo remediation had landed
on sha256_file (where the io::Error conversion is a useless-conversion
lint under current clippy) instead of sha256_file_bounded, the
function actually flagged — the retry now lives on both, with the
conversion only where the error type changes. And the release-publish
invariant correctly refuses [Unreleased] changelog content at an
already-released version: this stack is the 0.6.0 work, so ordvec and
ordvec-manifest now say 0.6.0 (minor: limit-semantics change, additive
APIs, behavioral perf changes).

* chore: track 0.6.0 in the fuzz workspace lock

* chore: lockstep all workspace member versions at 0.6.0

The release-publish SBOM invariant requires member package versions in
lockstep with the root; ordvec-ffi and both python bindings follow the
0.6.0 bump.

* chore: complete the 0.6.0 release shape

Closes the remaining release-publish invariant layers, verified by
running tests/release_publish_invariants.py locally to a clean exit:
pyproject + __init__ versions in lockstep, the changelog cut as a
dated 0.6.0 section (invariant convention: the current version always
has a dated section; [Unreleased] stays empty), THREAT_MODEL status
line at v0.6.0, and the README quickstart installing 0.6.

* fix: checked selection-state bounds in the streamed candidate core

Bot-review remediation (Qodo, PR #278): nq * m_eff can overflow usize
on 32-bit/wasm32 targets, and the CSR wrapper's saturating_mul would
attempt a usize::MAX allocation. Both sites now use checked
multiplication with a clear tile-the-batch message, matching the
crate's checked-allocation discipline. The exact m_eff + 1 heap
reservation is kept deliberately: gradual growth double-allocates to
the next power of two (~2x peak per query) — the reservation is the
memory-optimal choice, now documented.

* fix: single-snapshot create hashing, strict read bounds, zero-size primary shape check

Bot-review remediation (Qodo, #283 inline):
- create_manifest_for_index_with_options observed the index size twice
  (probe, then a separate stat for the hash bound) — a concurrent
  writer could produce a manifest whose size and digest describe
  different bytes. The hash is now bounded by the probe's size, the
  manifest records the byte count actually hashed, and any
  disagreement fails loudly.
- sha256_file_bounded could read (not hash) up to one 64KiB chunk past
  the bound; reads now clamp to max_bytes + 1, mirroring
  read_bounded_file's take() pattern.
- validate_manifest_shape gains artifact_file_size_zero for the
  primary artifact, matching the profile artifacts' explicit zero
  rejection instead of surfacing a confusing artifact_file_too_large.

* perf: build query bitmaps in place in the streamed core

Bot-review remediation (Qodo, #283 inline): build_query_bitmap
allocated a fresh Vec and re-validated finiteness per query; the
entry points already validate the whole buffer and the destination
is preallocated. Oracle suites pin bit-identical output.

* fix: apply the index ceiling on the create path

Bot-review remediation (Qodo, #282): --max-index-artifact-bytes wired
into ResourceLimits but the create path bounded the primary hash by
the probed size alone — the opt-in ceiling was ineffective for
create, unlike auxiliary artifacts. Create now mirrors verify:
declared/observed size min explicit ceiling.
…#282)

Expose --max-index-artifact-bytes on the ordvec-manifest CLI LimitArgs,
wiring it to ResourceLimits::max_index_artifact_bytes so the opt-in
primary-artifact read ceiling reaches feature parity with the existing
--max-auxiliary-artifact-bytes flag.

Close the deferred CIPHER-04 reason-code symmetry: validate_manifest_shape
now rejects a zero manifest-declared artifact.file_size_bytes
(artifact_file_size_zero) and validate_auxiliary_artifact_shape rejects
zero-size declarations on required auxiliary artifacts
(auxiliary_artifact_file_size_zero), mirroring the calibration and
encoder-distortion *_file_size_zero checks. Optional artifacts keep the
established zero-size absent-placeholder convention.
@Fieldnote-Echo

Copy link
Copy Markdown
Member Author

Merged to main via #283's squash (bee2fad) — main's tree is byte-identical to the final collapse tip (verified: git rev-parse origin/main^{tree} == docs/release-hygiene^{tree}). Squash-merging broke the head-reachability that would have auto-closed this PR; closing manually. All review findings on this PR were remediated in-branch before the collapse; the commits and review trail remain linked here.

@Fieldnote-Echo Nelson Spence (Fieldnote-Echo) deleted the fix/manifest-derived-limits branch July 4, 2026 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants