fix: derive artifact read bounds from declared sizes in ordvec-manifest by Fieldnote-Echo · Pull Request #277 · Project-Navi/ordvec

Fieldnote-Echo · 2026-07-03T17:52:50Z

Summary

Verify path bounds every artifact read by the manifest-declared file_size_bytes; the manifest itself stays hard-capped at 1 MiB and SHA-256 pins content. Create path bounds reads by the artifact's observed size.
Flat ResourceLimits byte caps (max_auxiliary_artifact_bytes, max_calibration_profile_bytes, max_encoder_distortion_profile_bytes) become opt-in ceilings, default unbounded. Explicitly configured caps behave exactly as before.
The primary artifact read was previously unbounded; it now gets a declared-size bound and a new artifact_file_too_large reason code (fail-fast on grown artifacts instead of digest-mismatch after hashing the excess).
sha256_file_bounded streams with a 64 KiB buffer — constant memory at any artifact size (previously materialised the whole file).

Why

The 64 MiB auxiliary default made legitimate sign sidecars impossible to persist past 524,288 rows at dim=1024 (sign.ovsb = rows × dim/8). Measured on a 1,258,135-row × 1024-dim corpus: write_verified_bundle failed with default options. A security bound meant for hostile foreign input was applied to self-written artifacts.

Hostile-input posture (what precisely changed)

Memory safety and primary-artifact bounding improved; default verification time/IO on hostile bundles is now deployment-bounded via the opt-in ResourceLimits ceilings (the old flat 64 MiB default also bounded attacker-supplied I/O — that bound is now a knob, documented in THREAT-QUERY-003). Note: file_size_bytes is a required manifest field — v1 manifests without it fail deserialization; there is no fallback path.

Manifest parse bound unchanged (1 MiB).
A hostile manifest cannot cause unbounded memory (streaming hash); I/O+CPU remain proportional to bytes actually supplied — deployments that must bound worst-case verification time set the explicit ceilings.
Inflated declared size with unchanged bytes → auxiliary_artifact_file_size_mismatch; grown artifact → fail-fast *_file_too_large; truncation → size mismatch. All covered by new tests.
THREAT_MODEL.md gains THREAT-QUERY-003 documenting the derived-bound model and reiterating that VerifiedLoadPlan is a snapshot, not a byte pin.

Tests

New tests/derived_limits.rs: >64 MiB roundtrip at pure defaults, grown/truncated/inflated-declaration cases, explicit-cap back-compat, primary-artifact bound.
Two existing tests updated to the fail-fast contract (append-growth → artifact_file_too_large); corruption coverage preserved by switching that test to in-place corruption.
Full gate: workspace tests, no-default-features, --locked, -D warnings, MSRV 1.89, fuzz build — all green. 63/63 ordvec-manifest tests.

Part of the 1M-row release train (Track A1 of the locked master plan).

Verification now bounds every artifact read by its manifest-declared file_size_bytes (manifest hard-capped at 1 MiB; SHA-256 pins content); creation bounds reads by the observed file size. Flat ResourceLimits byte caps become opt-in ceilings (default unbounded). The primary artifact read, previously unbounded, gains a declared-size bound and the artifact_file_too_large reason code. sha256_file_bounded now streams with constant memory instead of materialising files. Fixes the undocumented 64 MiB auxiliary cap that made sign-sidecar bundles impossible to write past 524,288 rows at dim=1024 (measured on a 1,258,135-row corpus).

chatgpt-codex-connector · 2026-07-03T17:52:57Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Fieldnote-Echo · 2026-07-03T17:53:00Z

Codex (@codex) review

chatgpt-codex-connector · 2026-07-03T17:53:12Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

qodo-code-review · 2026-07-03T17:54:05Z

PR Summary by Qodo

Derive artifact read bounds from manifest-declared sizes in ordvec-manifest

🐞 Bug fix ✨ Enhancement 🧪 Tests 📝 Documentation 🕐 40+ Minutes

AI Description

• Bound verification reads by manifest-declared file sizes; add fail-fast oversized error codes.
• Make ResourceLimits byte caps opt-in ceilings (defaults unbounded) to allow large sidecars.
• Stream bounded SHA-256 hashing with constant memory; add regression tests and docs updates.

Diagram

graph TD
  C["Create manifest"] --> S["Observe file size"] --> H["sha256_file_bounded"] --> M[("Manifest (size+sha256)")]
  V["Verify bundle"] --> M --> B{"Compute read bound"} --> H --> R["VerificationReport"]
  L["ResourceLimits caps"] -->|"optional ceiling"| B
  subgraph Legend
    direction LR
    _op["Operation"] ~~~ _dec{"Decision"} ~~~ _data[("Data")]
  end

High-Level Assessment

The following are alternative approaches to this PR:

1. Raise the default flat caps (keep flat-bound model)

➕ Simpler mental model: one max-bytes knob per artifact type
➕ Still bounds worst-case verification I/O/CPU by default
➖ Still risks rejecting legitimate artifacts as datasets grow
➖ Choosing a safe-yet-non-breaking default is hard and likely to regress again

2. Two-tier policy: strict defaults for verify, unbounded defaults for create

➕ Keeps hostile-input posture bounded by default while unblocking self-written bundles
➕ More explicit separation of trusted/untrusted paths
➖ Surprising inconsistency between create and verify defaults
➖ More configuration/documentation complexity; can still break common workflows

Recommendation: The PR’s derived-bound strategy is the best tradeoff: it makes verification proportional to the declared artifact sizes (anchored by a small, pinned manifest) and preserves a knob (ResourceLimits) for deployments that need explicit worst-case caps. The streaming hash implementation also removes an important memory-risk footgun without weakening integrity checks.

Files changed (6) +303 / -24

Bug fix (1) +57 / -11

lib.rsDerive per-artifact hash bounds; stream sha256_file_bounded; add new reason code +57/-11

Derive per-artifact hash bounds; stream sha256_file_bounded; add new reason code

• Changes default ResourceLimits byte caps to u64::MAX (opt-in ceilings) and updates verification to bound reads by manifest-declared sizes, applying min(declared, configured_cap) for auxiliary/profile artifacts. Adds a declared-size bound for the primary artifact with a new artifact_file_too_large code, and rewrites sha256_file_bounded to stream with a 64 KiB buffer instead of materializing the whole file.

ordvec-manifest/src/lib.rs

Tests (2) +205 / -9

derived_limits.rsAdd tests for derived bounds, oversized behavior, and cap back-compat +197/-0

Add tests for derived bounds, oversized behavior, and cap back-compat

• Adds end-to-end coverage proving default options accept auxiliary artifacts larger than the legacy 64 MiB cap, and verifies correct failure modes for grown, truncated, and inflated-declaration artifacts. Also asserts explicit caps remain enforced and the primary artifact now fails fast when grown past its declared size.

ordvec-manifest/tests/derived_limits.rs

manifest.rsUpdate existing tests for new fail-fast oversized behavior +8/-9

Update existing tests for new fail-fast oversized behavior

• Reworks the corruption test to flip a byte in-place (same size) so it still exercises SHA-256 mismatch after passing the declared-size bound. Updates the VerifiedLoadPlan re-verification test to expect artifact_file_too_large when the artifact grows past its declared size.

ordvec-manifest/tests/manifest.rs

Documentation (3) +41 / -4

CHANGELOG.mdDocument derived artifact bounds and streaming hashing change +19/-1

Document derived artifact bounds and streaming hashing change

• Replaces the empty Unreleased section with entries describing derived size bounds, newly bounded primary artifact reads, and constant-memory hashing. Calls out the behavioral change from default 64 MiB auxiliary caps to unbounded-by-default opt-in ceilings.

CHANGELOG.md

THREAT_MODEL.mdAdd threat-model entry for derived artifact read bounds +13/-0

Add threat-model entry for derived artifact read bounds

• Introduces THREAT-QUERY-003 documenting the new derived-bound model, noting constant-memory hashing and that I/O/CPU can still scale with attacker-supplied bytes unless explicit caps are configured. Reiterates that VerifiedLoadPlan is a snapshot rather than a byte pin.

THREAT_MODEL.md

README.mdUpdate limit-code docs to reflect derived bounds and new primary bound +9/-3

Update limit-code docs to reflect derived bounds and new primary bound

• Updates the stable limit-code section to describe per-artifact bounds derived from manifest-declared sizes (verify) or observed sizes (create). Adds the new primary index artifact bound and clarifies that flat ResourceLimits caps are opt-in ceilings.

ordvec-manifest/README.md

codecov · 2026-07-03T17:55:45Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

qodo-code-review · 2026-07-03T17:58:44Z

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0) 🎨 UX issues (0) 🔗 Cross-repo conflicts (0) 📜 Skill insights (0)

1. ~~Primary create hash unbounded~~ ✓ Resolved 🐞 Bug ≡ Correctness

Description

Verification now hashes the primary index artifact with `sha256_file_bounded(...,
manifest.artifact.file_size_bytes, ...)`, but manifest creation still hashes the primary artifact
with sha256_file() and separately records file_size_bytes from probed metadata. If the index
file grows while the manifest is being created, the manifest can end up with a digest computed over
more bytes than verification will ever read under the declared-size bound.

Code

ordvec-manifest/src/lib.rs[R260-268]

+        // Bound the read by the manifest-declared size: a primary artifact
+        // larger than its declaration fails fast instead of being hashed in
+        // full (the read was previously unbounded).
+        match sha256_file_bounded(
+            &resolved.canonical_path,
+            document.manifest.artifact.file_size_bytes,
+            "artifact_file_too_large",
+            "index artifact",
+        ) {

Relevance

⭐⭐⭐ High
Team frequently accepts hashing/TOCTOU hardening; prior work replaced unbounded sha256_file with
bounded patterns in verifier.
PR-#157
PR-#163
PR-#152

ⓘ Recommendations generated based on similar findings in past PRs

Evidence

The verification path now uses a declared-size bounded hash for the primary artifact, but the
creation path still computes the primary artifact hash with sha256_file() while separately
recording file_size_bytes from metadata; this leaves a race window where digest and declared size
can diverge if the file changes during create.

/ordvec-manifest/src/lib.rs[241-296]
/ordvec-manifest/src/lib.rs[3546-3572]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Primary artifact verification is now declared-size bounded, but primary artifact *creation* still hashes via `sha256_file()` without bounding to a stable observed length and without asserting `hash.size_bytes == metadata.file_size_bytes`. This can produce a manifest whose `sha256` and `file_size_bytes` describe different snapshots if the index is concurrently modified during manifest creation.

## Issue Context
Auxiliary artifact creation was updated to bound reads by observed size; the primary artifact creation path was not updated similarly.

## Fix Focus Areas
- ordvec-manifest/src/lib.rs[241-296]
- ordvec-manifest/src/lib.rs[3546-3572]

## Suggested fix
- In `create_manifest_for_index_with_options`, compute an `observed_len` for the primary artifact (use `metadata.file_size_bytes` or `fs::metadata(index_path)?.len()`), then hash with `sha256_file_bounded(index_path, observed_len, "artifact_file_too_large", "index artifact")`.
- After hashing, if `hash.size_bytes != observed_len`, return a creation-time error indicating the artifact changed during hashing.
- Populate `artifact.sha256` and `artifact.file_size_bytes` from the bounded hash result to keep them consistent with the bytes that were actually hashed.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. ~~Interrupted read not retried~~ ✓ Resolved 🐞 Bug ☼ Reliability

Description

sha256_file_bounded() now uses a manual read() loop but does not retry on
ErrorKind::Interrupted, so hashing can fail spuriously when syscalls are interrupted by signals.
This is a regression relative to read_to_end()-style loops that transparently retry interrupts.

Code

ordvec-manifest/src/lib.rs[R3477-3497]

+    let mut file = File::open(path)?;
    let mut hasher = Sha256::new();
-    hasher.update(&bytes);
+    let mut size_bytes = 0u64;
+    let mut buf = [0u8; 64 * 1024];
+    loop {
+        let n = file.read(&mut buf)?;
+        if n == 0 {
+            break;
+        }
+        size_bytes += n as u64;
+        if size_bytes > max_bytes {
+            return Err(ManifestError::limit_exceeded(
+                code,
+                format!(
+                    "{context} exceeds {max_bytes} bytes while reading {}",
+                    path.display()
+                ),
+            ));
+        }
+        hasher.update(&buf[..n]);
+    }

Relevance

⭐⭐ Medium
No direct prior review precedent on EINTR retries in Rust read loops; team generally accepts
reliability hardening though.
PR-#203
PR-#157

ⓘ Recommendations generated based on similar findings in past PRs

Evidence

The new streaming loop calls file.read(&mut buf)? directly; there is no branch to handle/retry
Interrupted, so an EINTR will bubble up as a verification failure.

/ordvec-manifest/src/lib.rs[3470-3502]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`sha256_file_bounded()` reads in a loop with `file.read(&mut buf)?` and will return an error on `io::ErrorKind::Interrupted` instead of retrying. This can cause rare but real spurious failures in verification/creation.

## Issue Context
The PR replaced the previous bounded hashing implementation with streaming reads.

## Fix Focus Areas
- ordvec-manifest/src/lib.rs[3451-3502]

## Suggested fix
- Change the read loop to explicitly `continue` on `Err(e)` where `e.kind() == io::ErrorKind::Interrupted`.
- (Optional but recommended) apply the same EINTR retry behavior to `sha256_file()` for consistency.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Security-audit remediation (CIPHER-02): the primary artifact read was bounded only by the attacker-declared size with no configurable ceiling, unlike the auxiliary/profile classes — the documented THREAT-QUERY-003 mitigation was silently incomplete for the largest artifact in a bundle. Adds ResourceLimits::max_index_artifact_bytes (default unbounded) applied as a min() with the declared size, and bounds the create-path primary read by its observed size so the code matches the CHANGELOG claim. CLI flag parity follows separately.

Code-review remediation: the sqlite-feature cache-key path duplicated the hashing logic and missed the derived-bound change — the primary artifact hash was fully unbounded, and the calibration/encoder profile hashes were bounded only by the flat caps, which the default flip turned into effectively unbounded reads. All three now use the same declared-size .min(opt-in ceiling) derivation as the verify path; a bound violation is a cache miss. Adds default-limits grown-profile coverage for the calibration and encoder-distortion call sites, closing the per-site test gap that let this slip.

Fieldnote-Echo · 2026-07-03T18:08:34Z

Adversarial audit triage (cipher security agent + code-reviewer agent)

Fixed in 41c3c03:

CIPHER-02 (Low, most actionable): primary index artifact had no opt-in ceiling — added ResourceLimits::max_index_artifact_bytes (default unbounded) applied as .min() with the declared size; create-path primary read now bounded by observed size (also closes the reviewer's CHANGELOG-overclaim MEDIUM).

Fixed in 763dffc:

Reviewer HIGH: sqlite-feature cache-key path duplicated hashing and missed the derived-bound change — primary hash was fully unbounded; calibration/encoder profile hashes were flat-cap-only, which the default flip made effectively unbounded. All three now use declared-size .min(ceiling); bound violation = cache miss.
Reviewer HIGH (coverage): added default-limits grown-profile tests for the calibration and encoder-distortion call sites (the missing per-site coverage that let the sqlite gap slip). Gate now runs --features sqlite: 78 tests green.

Explained / accepted:

CIPHER-01 (Low): default posture allows verification I/O proportional to bytes the attacker actually ships — documented tradeoff in THREAT-QUERY-003; opt-in ceilings are the mitigation and now cover every artifact class including the primary.
CIPHER-03 (Info): growth now surfaces as *_file_too_large (fail-fast), *_file_size_mismatch = truncation only — noted in CHANGELOG.

Deferred to follow-up issues:

CIPHER-04 (Info): zero-declared-size reason-code symmetry for primary/aux (*_file_size_zero).
CLI flag parity for --max-index-artifact-bytes.

Security-review remediation (fleet CIPHER-001): with derived read bounds, a FIFO inside the bundle directory would block verification forever — File::open on a reader-less FIFO blocks, and a device node would stream without EOF under a large declared size. Stat the path before opening and refuse anything that is not a regular file, at every hashing call site (create, verify, sqlite cache). Regular files terminate at EOF and remain post-checked against the declaration. Unix regression test uses a real FIFO.

Bot-review remediation (Qodo, PR #277): the streaming loop surfaced ErrorKind::Interrupted as a hash failure, a reliability regression vs the previous read_to_end which retried EINTR internally.

Fieldnote-Echo · 2026-07-04T00:10:04Z

Qodo triage: (1) EINTR retry in the streaming hash — fixed in b18ebaf. (2) Create-path primary hashed unbounded — already fixed in 41c3c03 (bounded by observed size), which post-dates the review snapshot; no further change.

qodo-code-review · 2026-07-04T00:10:20Z

triage: (1) EINTR retry in the streaming hash — fixed in b18ebaf. (2) Create-path primary hashed unbounded — already fixed in 41c3c03 (bounded by observed size), which post-dates the review snapshot; no further change.

That triage looks correct.

finding 1 is already marked implemented, and the snapshot reflects the later fix in 41c3c03.
finding 2 is also already implemented, with the EINTR retry covered by b18ebaf.

So there’s nothing further to change in this PR snapshot.

CI caught both halves: the retry from the Qodo remediation had landed on sha256_file (where the io::Error conversion is a useless-conversion lint under current clippy) instead of sha256_file_bounded, the function actually flagged — the retry now lives on both, with the conversion only where the error type changes. And the release-publish invariant correctly refuses [Unreleased] changelog content at an already-released version: this stack is the 0.6.0 work, so ordvec and ordvec-manifest now say 0.6.0 (minor: limit-semantics change, additive APIs, behavioral perf changes).

The release-publish SBOM invariant requires member package versions in lockstep with the root; ordvec-ffi and both python bindings follow the 0.6.0 bump.

Closes the remaining release-publish invariant layers, verified by running tests/release_publish_invariants.py locally to a clean exit: pyproject + __init__ versions in lockstep, the changelog cut as a dated 0.6.0 section (invariant convention: the current version always has a dated section; [Unreleased] stays empty), THREAT_MODEL status line at v0.6.0, and the README quickstart installing 0.6.

…imary shape check Bot-review remediation (Qodo, #283 inline): - create_manifest_for_index_with_options observed the index size twice (probe, then a separate stat for the hash bound) — a concurrent writer could produce a manifest whose size and digest describe different bytes. The hash is now bounded by the probe's size, the manifest records the byte count actually hashed, and any disagreement fails loudly. - sha256_file_bounded could read (not hash) up to one 64KiB chunk past the bound; reads now clamp to max_bytes + 1, mirroring read_bounded_file's take() pattern. - validate_manifest_shape gains artifact_file_size_zero for the primary artifact, matching the profile artifacts' explicit zero rejection instead of surfacing a confusing artifact_file_too_large.

Bot-review remediation (Qodo, #282): --max-index-artifact-bytes wired into ResourceLimits but the create path bounded the primary hash by the probed size alone — the opt-in ceiling was ineffective for create, unlike auxiliary artifacts. Create now mirrors verify: declared/observed size min explicit ceiling.

* fix: derive artifact read bounds from declared sizes in ordvec-manifest Verification now bounds every artifact read by its manifest-declared file_size_bytes (manifest hard-capped at 1 MiB; SHA-256 pins content); creation bounds reads by the observed file size. Flat ResourceLimits byte caps become opt-in ceilings (default unbounded). The primary artifact read, previously unbounded, gains a declared-size bound and the artifact_file_too_large reason code. sha256_file_bounded now streams with constant memory instead of materialising files. Fixes the undocumented 64 MiB auxiliary cap that made sign-sidecar bundles impossible to write past 524,288 rows at dim=1024 (measured on a 1,258,135-row corpus). * test: pin sign candidate-generation contract ahead of tiled internals Independent oracle (score_all + full lexicographic sort by hamming asc, doc_id asc) pins top_m_candidates and top_m_candidates_batched_serial_csr exactly: random corpora across block boundaries, massive-tie and duplicate-run corpora exercising boundary tie-breaks, edge geometries (m >= n, single doc, empty batch), and the dim=1024 shape. Must pass bit-identically before and after the tiling swap. * fix: give the primary index artifact an opt-in read ceiling Security-audit remediation (CIPHER-02): the primary artifact read was bounded only by the attacker-declared size with no configurable ceiling, unlike the auxiliary/profile classes — the documented THREAT-QUERY-003 mitigation was silently incomplete for the largest artifact in a bundle. Adds ResourceLimits::max_index_artifact_bytes (default unbounded) applied as a min() with the declared size, and bounds the create-path primary read by its observed size so the code matches the CHANGELOG claim. CLI flag parity follows separately. * fix: bound sqlite cache-key hashes by declared sizes Code-review remediation: the sqlite-feature cache-key path duplicated the hashing logic and missed the derived-bound change — the primary artifact hash was fully unbounded, and the calibration/encoder profile hashes were bounded only by the flat caps, which the default flip turned into effectively unbounded reads. All three now use the same declared-size .min(opt-in ceiling) derivation as the verify path; a bound violation is a cache miss. Adds default-limits grown-profile coverage for the calibration and encoder-distortion call sites, closing the per-site test gap that let this slip. * perf: stream the corpus once per call in sign candidate generation top_m_candidates_batched_serial_csr previously looped the single-query path, re-streaming the full sign bitmap per query (documented-naive Track-1). The internals now scan the corpus once per call in L2-sized doc blocks, score every query of the call against each hot block in query tiles via the existing batched kernel, and select per-query top-m with bounded (hamming, doc_id) min-collectors — bit-identical to a full sort by construction, independent of processing order (the key IS the contract's sort key). top_m_candidates routes through the same core, dropping its per-call n-row Hamming materialisation. Per-query corpus traffic drops by the call's query count: at 1.26M rows x 1024 dims, a 2048-query call reads the 161MB sidecar once instead of 2048 times. Serial contract preserved (no rayon); the oracle suite (tests/tiled_candgen.rs) pins bit-identical outputs across random, tie-heavy, duplicate-run, and edge geometries. * perf: keep the dense partition path for single-query candidates Audit remediation: routing top_m_candidates through the streamed core measured +50-90% at small/medium n with m in the hundreds (bounded heap O(n log m) vs select_nth_unstable_by O(n)); with one query there is no scan to share, so nq=1 stays on the dense path (parity-or-better at every measured size). Also per audit: the block-boundary oracle test now genuinely spans three blocks (the dim=128 shape fit one block), and adds the dim=768 AVX-512 tail-residue x multi-block case to the permanent suite. * perf: parallel finite validation and scratch-based rank encode assert_all_finite paid a full serial pass per add/search batch — measured ~0.1s per GiB, twice per ingest batch counting the caller layer. Scans of 1M+ floats now split across the rayon pool (4.4x measured). RankQuant::add's per-row closure allocated a fresh ranks Vec per vector inside the parallel loop; for_each_init now reuses a per-worker scratch via rank_transform_into. Measured on the 1.26M x 1024 corpus slice: encode-path attribution 0.097s serial scan -> 0.022s parallel; alloc churn removed from the hot loop. * perf: reduce collector boundary test to a cached worst-bound compare Doc ids visit each per-query heap strictly ascending, so a candidate tying the worst kept hamming always loses the (hamming, doc_id) tie-break — once the collector is full, the accept test is exactly 'hamming < worst kept hamming'. Cache that bound in a register-friendly u32 (u32::MAX while filling) and skip the heap peek + tuple compare on the ~99.8% reject path. Bit-identical by construction; pinned by the tie-heavy and duplicate-run oracle suites. * perf: LUT + parallel constant-composition check on RankQuant load load_rankquant's forged-buffer defense histogrammed every packed code serially — 1.29 billion shift/mask ops at 1.26M x 1024, ~1s of the 1.27s verified open. A 4KB per-byte bucket-count LUT replaces the per-code inner loop and rows validate in parallel; find_first keeps the lowest-offending-row error contract, with a scalar recheck producing the identical message. The security property is unchanged: every row still proves uniform composition before the index is usable. * docs: changelog perf entries and 0.6.0 downstream un-patch checklist CHANGELOG Unreleased gains the measured perf work merged to integration/full-stack: tiled streaming sign candidate generation + cached collector worst-bound (bit-identical internals swap; downstream batched search 220 -> 10.2k q/s at 1.26M x 1024), parallel finite validation + scratch rank encode (0.097s -> 0.022s attribution), and the LUT + parallel constant-composition load check (verified open 1.27s -> 0.38s). RELEASING gains a one-time pre-publish item: remove OrdinalDB's [patch.crates-io] block pointing at integration/full-stack when 0.6.0 publishes. * feat: index-ceiling CLI parity and zero-size shape checks (CIPHER-04) Expose --max-index-artifact-bytes on the ordvec-manifest CLI LimitArgs, wiring it to ResourceLimits::max_index_artifact_bytes so the opt-in primary-artifact read ceiling reaches feature parity with the existing --max-auxiliary-artifact-bytes flag. Close the deferred CIPHER-04 reason-code symmetry: validate_manifest_shape now rejects a zero manifest-declared artifact.file_size_bytes (artifact_file_size_zero) and validate_auxiliary_artifact_shape rejects zero-size declarations on required auxiliary artifacts (auxiliary_artifact_file_size_zero), mirroring the calibration and encoder-distortion *_file_size_zero checks. Optional artifacts keep the established zero-size absent-placeholder convention. * fix: refuse non-regular artifact files before hashing Security-review remediation (fleet CIPHER-001): with derived read bounds, a FIFO inside the bundle directory would block verification forever — File::open on a reader-less FIFO blocks, and a device node would stream without EOF under a large declared size. Stat the path before opening and refuse anything that is not a regular file, at every hashing call site (create, verify, sqlite cache). Regular files terminate at EOF and remain post-checked against the declaration. Unix regression test uses a real FIFO. * docs: scope the serial CSR contract to scan and selection Security-review note (fleet CIPHER-002): parallel finite validation introduced in the encode train transitively touches the global rayon pool from inside the 'serial' CSR primitive. The serial guarantee is about candidate scan/selection ownership, not input validation; say so explicitly. * fix: retry interrupted reads in bounded streaming hash Bot-review remediation (Qodo, PR #277): the streaming loop surfaced ErrorKind::Interrupted as a hash failure, a reliability regression vs the previous read_to_end which retried EINTR internally. * fix: assert whole-row query buffers in the streamed core Bot-review remediation (Qodo, PR #278): the shared core derived nq by integer division; a ragged buffer from a future caller would silently truncate. All current callers validate upstream — this is the cheap in-core invariant. * perf: transpose-tree horizontal reduction in the batched sign kernel The AVX-512 batched scan paid eight serial _mm512_reduce_add_epi64 expansions per doc-chunk — roughly a third of per-doc cycles at dim=1024 (2 lanes) going to reduction rather than XOR+POPCNT work. An unpack/permute/shuffle tree folds all eight accumulators into one vector of sums (~25 ops replacing ~50), stored via one stack spill. Tail path (batch % 8) keeps the per-accumulator reduce. Bit-identical: pinned by the AVX-512-vs-scalar parity tests and the oracle suites. * docs: scope the serial-contract claim in the tiled candgen entry External-audit remediation: the entry claimed 'no rayon' unqualified; finite validation on large buffers may use the global pool (documented on the method), and top_m_candidates_batched is explicitly out of scope of the internals swap. * bench: regenerate committed synthetic results at the 0.6.0 heads two_stage_caller_owned_dim1024: stage-1 candidate generation 159.60 -> 94.60 us/query (1.69x), full two-stage 172.42 -> 103.75 us/query (1.66x) — same command, host, core pinning, and toolchain family; verified code-only by an A/B against main on the same day/machine (main reproduced the old numbers within 3%). rank_modes: single-query latency rows are intentionally unchanged by the batch rework (verified identical-within-noise main vs heads) and carry a refresh note saying so; encode columns reflect the parallel validation + scratch encode work. Quality columns bit-identical throughout. * docs: refresh README benchmarks at the 0.6.0 heads All figures and numbers regenerated by the committed make benchmark-beir pipeline on the same host class (9950X). Quality: nDCG within bootstrap noise of exact on both datasets, sign-rq2 trec-covid 0.7638 unchanged (deterministic selection held bit-identical through the perf train). Single-query hero effectively unchanged (52.4 ms flat vs 0.52 ms sign-rq2, ~101x) — that lane was intentionally untouched. Batched 1-thread view improves to ~10-12x over batched flat (once-per-call corpus streaming); threaded view: HNSW still leads, margin narrowed from ~2.3x to 1.6x over sign-rq2 (1.2x over bitmap-rq2). Build 47.1s vs 0.21s. No larger-corpus claims added. * docs: transcribe the refreshed hnsw nDCG in the tradeoff table * fix: land the EINTR retry on the bounded hash and bump to 0.6.0 CI caught both halves: the retry from the Qodo remediation had landed on sha256_file (where the io::Error conversion is a useless-conversion lint under current clippy) instead of sha256_file_bounded, the function actually flagged — the retry now lives on both, with the conversion only where the error type changes. And the release-publish invariant correctly refuses [Unreleased] changelog content at an already-released version: this stack is the 0.6.0 work, so ordvec and ordvec-manifest now say 0.6.0 (minor: limit-semantics change, additive APIs, behavioral perf changes). * chore: track 0.6.0 in the fuzz workspace lock * chore: lockstep all workspace member versions at 0.6.0 The release-publish SBOM invariant requires member package versions in lockstep with the root; ordvec-ffi and both python bindings follow the 0.6.0 bump. * chore: complete the 0.6.0 release shape Closes the remaining release-publish invariant layers, verified by running tests/release_publish_invariants.py locally to a clean exit: pyproject + __init__ versions in lockstep, the changelog cut as a dated 0.6.0 section (invariant convention: the current version always has a dated section; [Unreleased] stays empty), THREAT_MODEL status line at v0.6.0, and the README quickstart installing 0.6. * fix: checked selection-state bounds in the streamed candidate core Bot-review remediation (Qodo, PR #278): nq * m_eff can overflow usize on 32-bit/wasm32 targets, and the CSR wrapper's saturating_mul would attempt a usize::MAX allocation. Both sites now use checked multiplication with a clear tile-the-batch message, matching the crate's checked-allocation discipline. The exact m_eff + 1 heap reservation is kept deliberately: gradual growth double-allocates to the next power of two (~2x peak per query) — the reservation is the memory-optimal choice, now documented. * fix: single-snapshot create hashing, strict read bounds, zero-size primary shape check Bot-review remediation (Qodo, #283 inline): - create_manifest_for_index_with_options observed the index size twice (probe, then a separate stat for the hash bound) — a concurrent writer could produce a manifest whose size and digest describe different bytes. The hash is now bounded by the probe's size, the manifest records the byte count actually hashed, and any disagreement fails loudly. - sha256_file_bounded could read (not hash) up to one 64KiB chunk past the bound; reads now clamp to max_bytes + 1, mirroring read_bounded_file's take() pattern. - validate_manifest_shape gains artifact_file_size_zero for the primary artifact, matching the profile artifacts' explicit zero rejection instead of surfacing a confusing artifact_file_too_large. * perf: build query bitmaps in place in the streamed core Bot-review remediation (Qodo, #283 inline): build_query_bitmap allocated a fresh Vec and re-validated finiteness per query; the entry points already validate the whole buffer and the destination is preallocated. Oracle suites pin bit-identical output. * fix: apply the index ceiling on the create path Bot-review remediation (Qodo, #282): --max-index-artifact-bytes wired into ResourceLimits but the create path bounded the primary hash by the probed size alone — the opt-in ceiling was ineffective for create, unlike auxiliary artifacts. Create now mirrors verify: declared/observed size min explicit ceiling.

…#282) Expose --max-index-artifact-bytes on the ordvec-manifest CLI LimitArgs, wiring it to ResourceLimits::max_index_artifact_bytes so the opt-in primary-artifact read ceiling reaches feature parity with the existing --max-auxiliary-artifact-bytes flag. Close the deferred CIPHER-04 reason-code symmetry: validate_manifest_shape now rejects a zero manifest-declared artifact.file_size_bytes (artifact_file_size_zero) and validate_auxiliary_artifact_shape rejects zero-size declarations on required auxiliary artifacts (auxiliary_artifact_file_size_zero), mirroring the calibration and encoder-distortion *_file_size_zero checks. Optional artifacts keep the established zero-size absent-placeholder convention.

Fieldnote-Echo · 2026-07-04T16:15:48Z

Merged to main via #283's squash (bee2fad) — main's tree is byte-identical to the final collapse tip (verified: git rev-parse origin/main^{tree} == docs/release-hygiene^{tree}). Squash-merging broke the head-reachability that would have auto-closed this PR; closing manually. All review findings on this PR were remediated in-branch before the collapse; the commits and review trail remain linked here.

Nelson Spence (Fieldnote-Echo) requested review from Navi Bot (project-navi-bot) and Todd Baur (toadkicker) as code owners July 3, 2026 17:52

qodo-code-review Bot reviewed Jul 3, 2026

View reviewed changes

Comment thread ordvec-manifest/src/lib.rs

Comment thread ordvec-manifest/src/lib.rs

Nelson Spence (Fieldnote-Echo) added 2 commits July 3, 2026 13:04

This was referenced Jul 3, 2026

perf: stream the corpus once per call in sign candidate generation #278

Closed

perf: parallel finite validation and scratch-based rank encode #279

Closed

Nelson Spence (Fieldnote-Echo) added 2 commits July 3, 2026 18:04

fix: retry interrupted reads in bounded streaming hash

b18ebaf

Bot-review remediation (Qodo, PR #277): the streaming loop surfaced ErrorKind::Interrupted as a hash failure, a reliability regression vs the previous read_to_end which retried EINTR internally.

This was referenced Jul 4, 2026

feat: index-ceiling CLI parity + zero-size reason codes #282

Merged

docs: changelog perf entries + 0.6.0 un-patch checklist #283

Merged

Nelson Spence (Fieldnote-Echo) added 4 commits July 3, 2026 21:58

chore: track 0.6.0 in the fuzz workspace lock

0bcfb93

chore: lockstep all workspace member versions at 0.6.0

661529a

The release-publish SBOM invariant requires member package versions in lockstep with the root; ordvec-ffi and both python bindings follow the 0.6.0 bump.

project-navi-bot previously approved these changes Jul 4, 2026

View reviewed changes

Nelson Spence (Fieldnote-Echo) dismissed Navi Bot (project-navi-bot)’s stale review via 4b35a34 July 4, 2026 04:12

Navi Bot (project-navi-bot) self-requested a review July 4, 2026 16:10

project-navi-bot previously approved these changes Jul 4, 2026

View reviewed changes

Navi Bot (project-navi-bot) dismissed their stale review via 87fc55b July 4, 2026 16:11

Nelson Spence (Fieldnote-Echo) closed this Jul 4, 2026

Nelson Spence (Fieldnote-Echo) deleted the fix/manifest-derived-limits branch July 4, 2026 16:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix: derive artifact read bounds from declared sizes in ordvec-manifest#277

fix: derive artifact read bounds from declared sizes in ordvec-manifest#277
Nelson Spence (Fieldnote-Echo) wants to merge 12 commits into
mainfrom
fix/manifest-derived-limits

Fieldnote-Echo commented Jul 3, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot commented Jul 3, 2026

Uh oh!

Fieldnote-Echo commented Jul 3, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jul 3, 2026

Uh oh!

qodo-code-review Bot commented Jul 3, 2026

Uh oh!

codecov Bot commented Jul 3, 2026

Uh oh!

qodo-code-review Bot commented Jul 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Fieldnote-Echo commented Jul 3, 2026

Uh oh!

Fieldnote-Echo commented Jul 4, 2026

Uh oh!

qodo-code-review Bot commented Jul 4, 2026

Uh oh!

Fieldnote-Echo commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Uh oh!

Conversation

Fieldnote-Echo commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Hostile-input posture (what precisely changed)

Tests

Uh oh!

chatgpt-codex-connector Bot commented Jul 3, 2026

Uh oh!

Fieldnote-Echo commented Jul 3, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jul 3, 2026

Uh oh!

qodo-code-review Bot commented Jul 3, 2026

PR Summary by Qodo

Uh oh!

codecov Bot commented Jul 3, 2026

Codecov Report

Uh oh!

qodo-code-review Bot commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Uh oh!

Uh oh!

Uh oh!

Fieldnote-Echo commented Jul 3, 2026

Adversarial audit triage (cipher security agent + code-reviewer agent)

Uh oh!

Fieldnote-Echo commented Jul 4, 2026

Uh oh!

qodo-code-review Bot commented Jul 4, 2026

Uh oh!

Fieldnote-Echo commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fieldnote-Echo commented Jul 3, 2026 •

edited

Loading

qodo-code-review Bot commented Jul 3, 2026 •

edited

Loading