Skip to content

Doc-major kernel layout for batched sign scan (2-3x) #284

Description

Post-0.6.0 perf track: the batched AVX-512 sign kernel measures ~1.0 ns/(query·doc) vs ~0.3-0.35 theoretical ALU floor; the transpose-tree reduction experiment measured neutral (+3% within noise, branch perf/kernel-transpose-reduce, parked) — the remaining win requires doc-major score layout (contiguous stores per doc-chunk) with a tile transpose or strided collector reads, plus caller-owned parallel single-query via top_m_candidates_range (greenlit design: boring exact partitioning, deterministic merge). Evidence and analysis in ordinaldb's perf-train PR trail.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions