Skip to content

Promotion canary + auto-rollback for auto-promote (#484)#500

Merged
jerryfane merged 2 commits into
mainfrom
feat/484-canary-rollback
Jun 27, 2026
Merged

Promotion canary + auto-rollback for auto-promote (#484)#500
jerryfane merged 2 commits into
mainfrom
feat/484-canary-rollback

Conversation

@jerryfane

Copy link
Copy Markdown
Owner

Makes the previously-deferred [skillopt].auto_promote_canary knob real (follow-on to #471). When auto-promote fires and canary mode is configured with a valid auto_promote_canary_sample in (0,1], a guardrails-pass candidate is promoted to a new canary version state behind the live champion (the champion stays current) and a sampled fraction of job resolutions route to the canary. A bounded daemon regression window reuses the #465 Mode A harvested outcomes (no new evaluator) to compare canary-vs-prior-champion and graduates it (-> current, candidate.auto_promoted) or auto-rolls-back on a material regression (champion stays current, canary rejected, candidate.rolled_back).

Invariants

  • OFF BY DEFAULT / additive. ContractVersion stays 1; no wire/contract struct changes. With the knob off (the default) promotion is the unchanged Promotion policy + notifications: configurable auto-promote + candidate.awaiting_promotion events #471 direct promote, no canary state row is ever written, and template resolution is byte-identical (one indexed miss, no rng draw) — proven by an off-by-default golden test. With it on but auto_promote off, nothing auto-promotes (no canary is ever created).
  • Template-resolution safety. Sampled routing lives in the single resolution seam (Mailbox.templateSnapshot); the resolved champion is the always-valid fallback for any miss/error/no-canary/concurrent case, so a mid-canary / missing / half-promoted canary can never return no-template or a broken version. The draw is injectable (CanaryRand) and concurrency-safe (global rand); covered by a concurrent routing test.
  • Routing always resolves a valid version. A canary version carries a distinct resolved_commit, so the Mode A: automatic trace-harvested feedback for implement agents (verifiable outcomes → SkillOpt) #465 harvester attributes canary-routed outcomes to auto-trace:<canaryVersionID> with zero harvester change.
  • Auto-rollback integrity. Rollback never leaves the template without a current version or a dangling canary, and is idempotent. The prior champion stays the live current throughout the canary, so rollback retires the canary via the EXISTING RejectAgentTemplateVersion (idempotent on already-rejected) and reuses the EXISTING RevertAgentTemplateVersion defensively to guarantee the champion is current. Graduate reuses the EXISTING PromoteAgentTemplateVersion (extended to accept a canary target).
  • Fail-safe. Canary on with the sample unset/invalid => notify-only. The comparator holds (never rolls back, never graduates) on too few canary samples, no champion baseline, or unread feedback — uncertainty never rolls back on unread evidence and never graduates without confirming non-regression.

Changes

  • config: AutoPromoteCanarySample *float64 (validated (0,1]) + CanaryEnabled(); config stub in both the init template and docs.
  • db: new canary state + canary_sample/canary_started_at columns via an appended, idempotent migration (DEFAULTs keep old rows identical, partial index for the active-canary lookup); CanaryPromoteAgentTemplateVersion, GetActiveCanaryVersion; Promote/Reject extended for the canary target.
  • workflow: concurrency-safe sampled routing in templateSnapshot.
  • skillopt: additive AutoPromoteDecision.Canary; new pure EvaluateCanaryRegression comparator (reuses the Mode A: automatic trace-harvested feedback for implement agents (verifiable outcomes → SkillOpt) #465 harvest vocabulary).
  • cli: runCandidateNotify canary branch + candidate.canary_started; daemon canaryRegressionHarvester decorator (off-by-default gate).
  • events: candidate.canary_started / candidate.rolled_back.
  • docs: events + skillopt-exchange-contract in both the in-repo and website trees.

Tests

Comparator table (rollback / hold-thin / hold-unread / graduate); store canary transitions (champion stays current, graduate, rollback-keeps-champion, validation, migration on a pre-existing DB); sampled routing (hit at sample=1.0, miss above sample, off-by-default byte-identical, concurrent always-valid); runCandidateNotify canary path + off-by-default direct promote; daemon graduate/rollback/hold.

Gate

go build ./..., go vet ./..., go test ./..., and go test -race ./internal/workflow/ ./internal/skillopt/ ./internal/db/ all green. The full go test -race ./internal/cli/ is a known-slow suite that exceeds the sandbox time limit (CI only races internal/workflow); the new cli canary tests pass under -race.

Closes #484

🤖 Generated with Claude Code

jerryfane and others added 2 commits June 27, 2026 09:33
Make the previously-deferred `[skillopt].auto_promote_canary` knob real: when
auto-promote fires AND canary mode is configured with a valid
`auto_promote_canary_sample` in (0,1], a guardrails-pass candidate is promoted to
a new `canary` version state behind the live champion (the champion stays the
current version, so non-sampled resolutions are byte-identical) and a sampled
fraction of job resolutions route to the canary. A bounded daemon regression
window reuses the #465 Mode A harvested outcomes (no new evaluator) to compare the
canary vs the prior champion and graduates it (-> current, candidate.auto_promoted)
or auto-rolls-back on a material regression (champion stays current, canary
rejected, candidate.rolled_back).

OFF BY DEFAULT and additive (ContractVersion stays 1): with the knob off (the
default) promotion is the unchanged #471 direct promote and no canary state row is
ever written; template resolution is byte-identical (one indexed miss, no rng
draw). Fail-safe throughout: canary on with the sample unset/invalid is
notify-only; the comparator holds (never rolls back, never graduates) on too few
canary samples, no champion baseline, or unread feedback.

- config: new AutoPromoteCanarySample *float64 (validated (0,1]) + CanaryEnabled()
- db: new `canary` state + canary_sample/canary_started_at columns (appended,
  idempotent migration); CanaryPromoteAgentTemplateVersion, GetActiveCanaryVersion;
  PromoteAgentTemplateVersion/RejectAgentTemplateVersion extended to accept a
  canary target (graduate / rollback), reject idempotent on already-rejected
- workflow: concurrency-safe sampled routing in Mailbox.templateSnapshot (injectable
  CanaryRand; champion is always the valid fallback)
- skillopt: EvaluateAutoPromote returns additive Decision.Canary; new pure
  EvaluateCanaryRegression comparator
- cli: runCandidateNotify canary branch + candidate.canary_started; daemon
  canaryRegressionHarvester wraps the harvester (reuses RevertAgentTemplateVersion
  defensively + RejectAgentTemplateVersion to retire the canary)
- events: candidate.canary_started / candidate.rolled_back
- docs in both trees (events + skillopt-exchange-contract)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…pe rollback (#484)

Apply review fixes to the #484 promotion-canary auto-rollback:

- Gate canary ROUTING on the same policy.CanaryEnabled() the daemon's
  regression comparator uses. Add Mailbox.CanaryEnabled (returns before the
  GetActiveCanaryVersion query when off, byte-identical) and Engine.CanaryEnabled,
  wired from the resolved [skillopt] policy at every production construction site.
  Turning the knob off and restarting now disables BOTH seams, so a stranded
  canary can never keep serving sampled traffic with no auto-rollback.
- Bound the regression comparator to the canary window: filter BOTH the canary
  and champion feedback lists to CreatedAt >= canary_started_at before scoring,
  so the baseline is the champion's CONCURRENT outcomes, not its lifetime mean.
  Tolerant timestamp parsing (RFC3339 + SQLite datetime); fail-open when the
  window is empty/unparseable.
- Emit candidate.rolled_back exactly once: RejectAgentTemplateVersion now returns
  a changed bool (false on the idempotent already-rejected branch) and the daemon
  evaluator only emits on a real transition, so concurrent harvests do not
  double-fire the rollback event.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@jerryfane jerryfane merged commit 3c3824f into main Jun 27, 2026
1 check passed
@jerryfane jerryfane deleted the feat/484-canary-rollback branch June 27, 2026 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Promotion follow-on: canary + auto-rollback for auto-promote

1 participant