feat(eval): validate trace (behavioral) expectations (expect.trace) by kunalkushwaha · Pull Request #16 · AgenticGoKit/agk

kunalkushwaha · 2026-06-21T15:14:02Z

What

Implements the previously stubbed expect.trace assertions in the eval runner. Today the runner has a literal // TODO: Validate trace expectations and the fully-typed TraceExpectation struct goes unused. This wires it up so eval tests can check how an agent reached its answer — not just the output text.

This is feature B2 (behavioral assertions) from the FEATURES.md roadmap.

How it works

After the content match passes, the runner fetches the run's trace from the EvalServer (GET /traces/{id}) and evaluates the assertions. Tool calls also use the tools_called field from the /invoke response, so tool_calls is still checked even if the trace fetch fails.

tests:
  - name: "Answers about Paris using search, efficiently"
    input: "What's the weather in Paris?"
    expect:
      type: contains
      values: ["Paris"]
      trace:
        tool_calls: ["search"]                  # each listed tool must have been called
        llm_calls: 2                             # exact LLM-call count
        execution_path: ["research", "format"]   # ordered subsequence of span names
        min_steps: 2
        max_steps: 8

Field	Check
`tool_calls`	every listed tool must appear (subset)
`llm_calls`	exact match when > 0
`execution_path`	listed names appear in order (gaps allowed)
`min_steps` / `max_steps`	observed step count within bounds

Changes

internal/eval/trace_validator.go — ValidateTrace (pure) + buildObservedTrace normalizer + minimal decode types.
internal/eval/http_target.go — FetchTrace(traceID) against GET /traces/{id}.
internal/eval/runner.go — wires validation in after the content match (replaces the TODO).
docs/EVAL.md — new "Trace (Behavioral) Assertions" section documenting the real expect.trace schema.

Testing

go build, go vet, go test ./..., gofmt all green.
First unit tests for the eval package: ValidateTrace (11 cases), buildObservedTrace (with/without a trace), isOrderedSubsequence, and an httptest-backed FetchTrace (success + error paths) — so the behavior is verified without a live LLM/EvalServer.

Independent branch off main (alongside #13/#14/#15). The canonical expectation schema lives in internal/eval/types.go.

🤖 Generated with Claude Code

Implements the previously stubbed `expect.trace` assertions in the eval runner, so tests can check *how* an agent produced an answer, not just the content. - ValidateTrace: pure validator for tool_calls (subset), llm_calls (exact), execution_path (ordered subsequence), and min/max_steps. - buildObservedTrace: normalizes the EvalServer trace + invoke `tools_called` into an ObservedTrace (distinct tools, LLM-call count, path, step count). - HTTPTarget.FetchTrace: fetches a run's trace from GET /traces/{id}. - Runner: after the content match, fetches the trace and validates it; tool calls fall back to the invoke response when the trace can't be fetched. - Docs: new "Trace (Behavioral) Assertions" section in docs/EVAL.md. Adds the eval package's first unit tests (validator, normalizer, and an httptest-backed FetchTrace), so this is verified without a live LLM. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

kunalkushwaha mentioned this pull request Jun 21, 2026

feat(trace): add agk trace diff to compare two runs #17

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(eval): validate trace (behavioral) expectations (expect.trace)#16

feat(eval): validate trace (behavioral) expectations (expect.trace)#16
kunalkushwaha wants to merge 1 commit into
mainfrom
feat/eval-trace-assertions

kunalkushwaha commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kunalkushwaha commented Jun 21, 2026

What

How it works

Changes

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant