Promotion canary + auto-rollback for auto-promote (#484)#500
Merged
Conversation
Make the previously-deferred `[skillopt].auto_promote_canary` knob real: when auto-promote fires AND canary mode is configured with a valid `auto_promote_canary_sample` in (0,1], a guardrails-pass candidate is promoted to a new `canary` version state behind the live champion (the champion stays the current version, so non-sampled resolutions are byte-identical) and a sampled fraction of job resolutions route to the canary. A bounded daemon regression window reuses the #465 Mode A harvested outcomes (no new evaluator) to compare the canary vs the prior champion and graduates it (-> current, candidate.auto_promoted) or auto-rolls-back on a material regression (champion stays current, canary rejected, candidate.rolled_back). OFF BY DEFAULT and additive (ContractVersion stays 1): with the knob off (the default) promotion is the unchanged #471 direct promote and no canary state row is ever written; template resolution is byte-identical (one indexed miss, no rng draw). Fail-safe throughout: canary on with the sample unset/invalid is notify-only; the comparator holds (never rolls back, never graduates) on too few canary samples, no champion baseline, or unread feedback. - config: new AutoPromoteCanarySample *float64 (validated (0,1]) + CanaryEnabled() - db: new `canary` state + canary_sample/canary_started_at columns (appended, idempotent migration); CanaryPromoteAgentTemplateVersion, GetActiveCanaryVersion; PromoteAgentTemplateVersion/RejectAgentTemplateVersion extended to accept a canary target (graduate / rollback), reject idempotent on already-rejected - workflow: concurrency-safe sampled routing in Mailbox.templateSnapshot (injectable CanaryRand; champion is always the valid fallback) - skillopt: EvaluateAutoPromote returns additive Decision.Canary; new pure EvaluateCanaryRegression comparator - cli: runCandidateNotify canary branch + candidate.canary_started; daemon canaryRegressionHarvester wraps the harvester (reuses RevertAgentTemplateVersion defensively + RejectAgentTemplateVersion to retire the canary) - events: candidate.canary_started / candidate.rolled_back - docs in both trees (events + skillopt-exchange-contract) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…pe rollback (#484) Apply review fixes to the #484 promotion-canary auto-rollback: - Gate canary ROUTING on the same policy.CanaryEnabled() the daemon's regression comparator uses. Add Mailbox.CanaryEnabled (returns before the GetActiveCanaryVersion query when off, byte-identical) and Engine.CanaryEnabled, wired from the resolved [skillopt] policy at every production construction site. Turning the knob off and restarting now disables BOTH seams, so a stranded canary can never keep serving sampled traffic with no auto-rollback. - Bound the regression comparator to the canary window: filter BOTH the canary and champion feedback lists to CreatedAt >= canary_started_at before scoring, so the baseline is the champion's CONCURRENT outcomes, not its lifetime mean. Tolerant timestamp parsing (RFC3339 + SQLite datetime); fail-open when the window is empty/unparseable. - Emit candidate.rolled_back exactly once: RejectAgentTemplateVersion now returns a changed bool (false on the idempotent already-rejected branch) and the daemon evaluator only emits on a real transition, so concurrent harvests do not double-fire the rollback event. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Makes the previously-deferred
[skillopt].auto_promote_canaryknob real (follow-on to #471). When auto-promote fires and canary mode is configured with a validauto_promote_canary_samplein(0,1], a guardrails-pass candidate is promoted to a newcanaryversion state behind the live champion (the champion stayscurrent) and a sampled fraction of job resolutions route to the canary. A bounded daemon regression window reuses the #465 Mode A harvested outcomes (no new evaluator) to compare canary-vs-prior-champion and graduates it (-> current,candidate.auto_promoted) or auto-rolls-back on a material regression (champion stays current, canary rejected,candidate.rolled_back).Invariants
auto_promoteoff, nothing auto-promotes (no canary is ever created).Mailbox.templateSnapshot); the resolved champion is the always-valid fallback for any miss/error/no-canary/concurrent case, so a mid-canary / missing / half-promoted canary can never return no-template or a broken version. The draw is injectable (CanaryRand) and concurrency-safe (globalrand); covered by a concurrent routing test.resolved_commit, so the Mode A: automatic trace-harvested feedback for implement agents (verifiable outcomes → SkillOpt) #465 harvester attributes canary-routed outcomes toauto-trace:<canaryVersionID>with zero harvester change.currentthroughout the canary, so rollback retires the canary via the EXISTINGRejectAgentTemplateVersion(idempotent on already-rejected) and reuses the EXISTINGRevertAgentTemplateVersiondefensively to guarantee the champion is current. Graduate reuses the EXISTINGPromoteAgentTemplateVersion(extended to accept a canary target).Changes
AutoPromoteCanarySample *float64(validated(0,1]) +CanaryEnabled(); config stub in both the init template and docs.canarystate +canary_sample/canary_started_atcolumns via an appended, idempotent migration (DEFAULTs keep old rows identical, partial index for the active-canary lookup);CanaryPromoteAgentTemplateVersion,GetActiveCanaryVersion;Promote/Rejectextended for the canary target.templateSnapshot.AutoPromoteDecision.Canary; new pureEvaluateCanaryRegressioncomparator (reuses the Mode A: automatic trace-harvested feedback for implement agents (verifiable outcomes → SkillOpt) #465 harvest vocabulary).runCandidateNotifycanary branch +candidate.canary_started; daemoncanaryRegressionHarvesterdecorator (off-by-default gate).candidate.canary_started/candidate.rolled_back.Tests
Comparator table (rollback / hold-thin / hold-unread / graduate); store canary transitions (champion stays current, graduate, rollback-keeps-champion, validation, migration on a pre-existing DB); sampled routing (hit at sample=1.0, miss above sample, off-by-default byte-identical, concurrent always-valid);
runCandidateNotifycanary path + off-by-default direct promote; daemon graduate/rollback/hold.Gate
go build ./...,go vet ./...,go test ./..., andgo test -race ./internal/workflow/ ./internal/skillopt/ ./internal/db/all green. The fullgo test -race ./internal/cli/is a known-slow suite that exceeds the sandbox time limit (CI only racesinternal/workflow); the new cli canary tests pass under-race.Closes #484
🤖 Generated with Claude Code