Doc-major kernel layout for batched sign scan (2-3x)

Post-0.6.0 perf track: the batched AVX-512 sign kernel measures ~1.0 ns/(query·doc) vs ~0.3-0.35 theoretical ALU floor; the transpose-tree reduction experiment measured neutral (+3% within noise, branch perf/kernel-transpose-reduce, parked) — the remaining win requires doc-major score layout (contiguous stores per doc-chunk) with a tile transpose or strided collector reads, plus caller-owned parallel single-query via top_m_candidates_range (greenlit design: boring exact partitioning, deterministic merge). Evidence and analysis in ordinaldb's perf-train PR trail.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Doc-major kernel layout for batched sign scan (2-3x) #284

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Uh oh!

Doc-major kernel layout for batched sign scan (2-3x) #284

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions