Skip to content

fix(core): derive score from assertions when score absent in code-grader#1212

Merged
christso merged 3 commits intomainfrom
fix/1211-derive-score-from-assertions
May 4, 2026
Merged

fix(core): derive score from assertions when score absent in code-grader#1212
christso merged 3 commits intomainfrom
fix/1211-derive-score-from-assertions

Conversation

@christso
Copy link
Copy Markdown
Collaborator

@christso christso commented May 4, 2026

Summary

  • When a code-grader script returns { assertions } without an explicit score, the harness now derives score as passing / total instead of defaulting to 0
  • Removed redundant manual score computations from 6 example scripts that were already computing passing / total before returning { score, assertions } — they now return { assertions } only

What changed

packages/core/src/evaluation/graders/code-grader.ts — Reordered so assertions is built first, then score is derived from them when parsed.score is absent.

6 example scripts simplified (drop redundant score field):

  • copilot-log-eval/graders/transcript-quality.ts
  • import-claude/graders/transcript-quality.ts
  • code-grader-sdk/scripts/verify-attachments.ts
  • execution-metrics/scripts/check-metrics-present.ts
  • workspace-artifact/scripts/check-csv-artifact.ts
  • file-changes-with-repos/scripts/check-file-changes.ts

Not simplified (intentional custom scores):

  • execution-metrics/check-efficiency.ts — rounds score and slices assertion list; derived score from slice would differ
  • trial-output-consistency — uses a custom floating-point similarity score

Test plan

  • 3 new unit tests: assertions without score → derived as passing/total, all passing → score 1, all failing → score 0
  • All existing tests pass (2318 total)
  • Manual e2e: see results below

Closes #1211

christso and others added 2 commits May 4, 2026 05:30
…grader

When a code-grader script returns `{ assertions }` without an explicit
`score`, the harness now computes score as passing/total instead of
defaulting to 0. Also removes redundant manual score computations from
six example scripts that already had assertions covering the same logic.

Closes #1211

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented May 4, 2026

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: 87c61fa
Status: ✅  Deploy successful!
Preview URL: https://66859700.agentv.pages.dev
Branch Preview URL: https://fix-1211-derive-score-from-a.agentv.pages.dev

View logs

… test

Addresses code review feedback:
- Drop redundant passing/total score computation from functional-check.ts,
  validate-sync.ts, keyword-check.ts, and length-check.ts — same pattern
  as the 6 scripts updated in the previous commit
- Add test for `{"assertions":[]}` without score → score 0 (empty guard)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@christso christso merged commit d33285c into main May 4, 2026
4 checks passed
@christso christso deleted the fix/1211-derive-score-from-assertions branch May 4, 2026 04:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: derive code-grader score from assertions when score field is absent

1 participant