fix(prompts): auto-append JUDGE_CONFIDENCE directive in buildJudgeSys… by marceloceccon · Pull Request #4 · entropyvortex/ai-consensus-core

marceloceccon · 2026-05-26T00:03:57Z

extractJudgeConfidence requires a trailing JUDGE_CONFIDENCE: N line and silently defaults to 50 when absent. buildJudgeSystemPrompt was passing custom judge prompts through untouched, unlike buildParticipantSystemPrompt which auto-appends the matching CONFIDENCE: N directive. Any caller overriding the default JUDGE_PERSONA prompt silently received 50 on every run — a measurement-shaped value that polluted downstream statistics.

This change makes buildJudgeSystemPrompt mirror the participant builder: idempotently append the directive, skipping when the input already contains the marker (so JUDGE_PERSONA's inline directive — and any diligent custom caller — is not duplicated).

Discovered by a 12-run bench in ai-consensus-mcp where judge confidence was reported as exactly 50.0 ± 0.0 across every run.

No public API change. buildJudgeSystemPrompt's output is longer when the input prompt lacks the marker; callers that snapshot-test that output need to regenerate snapshots.

…temPrompt (0.11.1) extractJudgeConfidence requires a trailing `JUDGE_CONFIDENCE: N` line and silently defaults to 50 when absent. buildJudgeSystemPrompt was passing custom judge prompts through untouched, unlike buildParticipantSystemPrompt which auto-appends the matching `CONFIDENCE: N` directive. Any caller overriding the default JUDGE_PERSONA prompt silently received 50 on every run — a measurement-shaped value that polluted downstream statistics. This change makes buildJudgeSystemPrompt mirror the participant builder: idempotently append the directive, skipping when the input already contains the marker (so JUDGE_PERSONA's inline directive — and any diligent custom caller — is not duplicated). Discovered by a 12-run bench in ai-consensus-mcp where judge confidence was reported as exactly 50.0 ± 0.0 across every run. No public API change. buildJudgeSystemPrompt's output is longer when the input prompt lacks the marker; callers that snapshot-test that output need to regenerate snapshots.

marceloceccon merged commit f65dec7 into main May 26, 2026
6 checks passed

This was referenced May 26, 2026

feat(bench): held-out rubric evaluator + dep bump to ai-consensus-core 0.11.1 entropyvortex/ai-consensus-mcp#4

Merged

feat(bench): held-out rubric evaluator + dep bump to ai-consensus-core 0.11.1 entropyvortex/ai-consensus-mcp#5

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(prompts): auto-append JUDGE_CONFIDENCE directive in buildJudgeSys…#4

fix(prompts): auto-append JUDGE_CONFIDENCE directive in buildJudgeSys…#4
marceloceccon merged 1 commit into
mainfrom
fix/judge-confidence-contract

marceloceccon commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

marceloceccon commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant