Skip to content

perf: single-pass drift metric computation (#447)#464

Merged
tcconnally merged 1 commit into
mainfrom
perf/447-drift-single-pass
Jun 26, 2026
Merged

perf: single-pass drift metric computation (#447)#464
tcconnally merged 1 commit into
mainfrom
perf/447-drift-single-pass

Conversation

@tcconnally

Copy link
Copy Markdown
Collaborator

Addresses #447 item 2b (drift). _compute_drift built two lists (recent/baseline) then iterated each three timesrate(), tokens(), avg_len() — and the tokens pass re-ran the _extract_recommendation_tokens regex per entry.

Collapse to one pass: classify each entry into its window bucket and accumulate count, positive count, response-length sum, and the recommendation-token set inline — so each entry is visited once and its tokens extracted once. Metrics and the result dict are byte-identical (the existing drift tests pin them).

Deferred (same item): the time-windowed tail-read needs a chronological-ordering assumption on the JSONL log for a marginal CLI-only gain; the full read is already bounded by pythia.max_entries.

Tests: full oracle suite green (56), incl. the 8 drift tests pinning the acceptance-rate / jaccard / avg-length / count outputs.

🤖 Generated with Claude Code

Addresses #447 item 2b (drift). _compute_drift built two lists (recent/baseline)
then iterated each THREE times — rate(), tokens(), avg_len() — and the tokens
pass re-ran the _extract_recommendation_tokens regex per entry. Collapse to one
pass: classify each entry into its window bucket and accumulate count, positive
count, response-length sum, and the recommendation-token set inline, so each
entry is visited once and its tokens extracted once. Metrics and the result dict
are byte-identical (the existing drift tests pin them).

Deferred (same item): the time-windowed tail-read (read only the last N days
instead of the whole capped log). It needs a chronological-ordering assumption
on the JSONL log for a marginal CLI-only gain, so it's left out; the full read
is already bounded by pythia.max_entries.

Tests: full oracle suite green (56), incl. the 8 drift tests that pin the
acceptance-rate / jaccard / avg-length / count outputs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@tcconnally tcconnally merged commit ef3e502 into main Jun 26, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant