Conversation
Foundational scaffolding for the Fleet PR Remediation feature: 13 ADRs, DDD bounded-context/aggregate model, implementation plan, phase runner + orchestrator, and research/design docs. Establishes the long-running integration branch (develop) for the autonomous phase rollout.
…y test) (#33) * fix(ci): green the develop baseline (Node bump, quinn-proto CVE, flaky test) Pre-existing CI failures unrelated to the remediation feature, blocking the autonomous phase rollout. Three independent fixes: - i18n-validation.yml: bump Node 20 -> 26 (pnpm 11.1.3 requires Node >=22.13; Node 20 lacks node:sqlite). Matches ci.yml. Fixes the 4 i18n jobs. - Cargo.lock: bump quinn-proto 0.11.14 -> 0.11.15 (RUSTSEC-2026-0185, high 7.5: remote memory exhaustion via unbounded out-of-order stream reassembly; transitive via reqwest). Fixes Backend Security Audit. Also syncs workspace member versions 0.4.0 -> 0.5.1 to match Cargo.toml. - config.rs: serialize the three find_config_file tests behind a poison-tolerant mutex. They mutate process-global cwd + AMPEL_I18N_CONFIG and raced under the llvm-cov runner, failing Backend Test Coverage. * fix(ci): skip Playwright browser install when no visual specs exist The Test RTL Support job ran 'playwright install --with-deps chromium firefox' unconditionally, then the test step skips when no tests/visual/*.spec.ts exist (none do — only README/report markdown). The pointless browser+apt install hung for 49min with no timeout. Gate the install behind the same spec-exists check the test step uses, and add timeout-minutes: 15 as a backstop against future hangs.
Provider write primitives (RemediationCapable supertrait + GitHub/GitLab/Bitbucket/Mock impls). ADR-002, ADR-013. CI green on rebased develop.
Data model + Policy CRUD + Dry-Run. DB(6 entities+migrations)+PolicyResolver+RemediationService(ADR-004,ADR-014)+API(8 endpoints)+frontend(Fleet/Policies UI)+i18n(27 locales). CI green.
Mechanical consolidation + verification + Apalis jobs. ADR-003/005/010. Adversarial QE (mutation/pentest/chaos) hardened: error→terminal, guard-before-side-effect, fail-closed merge gate. CI green.
Observability & UX: run API + SSE live updates (ADR-011), awaiting_approval human gate, 5 Prometheus metrics + notifications + Grafana, live React timeline/audit/CI-matrix/kill-switch. CI green.
Agentic remediation tier: ModelProvider trait + Claude/Gemini/Ollama/ONNX, fix-loop harness, playbooks, model-account/playbook API, Tier-2 integration. ADR-006/007/008/009/012/013/014. SSRF+authz fixed; adversarial QE (mutation+pentest) 0 exploits. CI green.
Phase 5: strategy learning, feature-flagged reflexion memory, provider parity fallbacks, repo fingerprinting. All CI green.
Cross-phase review of phases 0–5 (read-only): no dead code, no blocking TODOs, fail-closed verification consistent (ADR-010), lockfile-class and failure-classification logic delegated core→worker with no duplication. No code changes warranted; optional enum-serialization macro deferred as risky churn with zero behavior change. develop HEAD == CI-green PR #38 tree.
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
📊 Coverage Report
Coverage Thresholds
Coverage reports generated by CI workflow |
🌐 Translation Coverage ReportOverall Coverage
Threshold: 95% Backend Coverage DetailsTranslation Coverage ReportOverall Coverage: 100.0% Source Keys: 864 Language Coverage
Frontend Coverage DetailsTranslation Coverage ReportGenerated: 2026-06-29T18:44:08.626Z Coverage by Language
Missing Translations✅ No missing translations Updated: 2026-06-29T18:44:27.908Z |
…atible Frontend (frontend/, pnpm): - Bump all deps to latest within-major; clears 22 pnpm-audit findings. - Security: axios 1.15.2->1.18.1 (6 GHSA), react-router-dom 7.14.2->7.18.0 (GHSA-8x6r-g9mw-2r78), vite 8.0.13->8.1.0 (GHSA-fx2h-pf6j-xcff, GHSA-v6wh-96g9-6wx3), undici->7.28.0, form-data->4.0.6, esbuild patched (all transitive). Frontend audit: 0 vulnerabilities. Root tooling (pnpm-only): - Move overrides to pnpm-workspace.yaml (pnpm 11 no longer reads package.json pnpm.overrides); force markdown-it ^14.2.0 (GHSA-6v5v-wf23-fmfq) and js-yaml ^4.3.0; linkify-it patched to 5.0.1 via update. Bump prettier floor. - Remove redundant package-lock.json: repo is pnpm-managed (packageManager pnpm@11.1.3, all CI uses pnpm); the legacy npm lockfile contradicted the toolchain, could not be cleanly maintained by npm (overrides ignored, divergent tree), and Dependabot was flagging it. pnpm-lock.yaml is now the single source of truth. Root audit: 0 vulnerabilities. Backend (Cargo): - cargo update to latest semver-compatible; openssl 0.10.79->0.10.81 clears RUSTSEC/GHSA-phqj-4mhp-q6mq. cargo audit: 0 vulnerabilities (4 informational unmaintained-crate warnings remain, transitive, non-blocking). - Includes pre-existing working-tree change: utoipa-swagger-ui "vendored" feature (offline Swagger assets). Verification: cargo fmt/clippy(-D warnings)/test all green; cargo audit exit 0. Frontend lint/type-check/test (897 passed)/build green; frozen-lockfile install reproduces. Fixes apply to develop.
🔒 Dependency Remediation & Latest-Compatible Updates (commit
|
| Package | From → To | Severity | Advisory | Notes |
|---|---|---|---|---|
axios |
1.15.2 → 1.18.1 | High×5 / Med / Low | GHSA-35jp-ww65-95wh, GHSA-hfxv-24rg-xrqf, GHSA-j5f8-grm9-p9fc, GHSA-p92q-9vqr-4j8v, GHSA-pjwm-pj3p-43mv, GHSA-777c-7fjr-54vf, GHSA-898c-q2cr-xwhg, GHSA-654m-c8p4-x5fp | Direct dep |
react-router / react-router-dom |
7.14.2 → 7.18.0 | High / Low | GHSA-8x6r-g9mw-2r78, GHSA-84g9-w2xq-vcv6 | Direct dep |
vite |
8.0.13 → 8.1.0 | High / Med | GHSA-fx2h-pf6j-xcff, GHSA-v6wh-96g9-6wx3 | Dev dep |
undici |
<7.28.0 → 7.28.0 | High / Med / Low ×5 | GHSA-hm92-r4w5-c3mj, GHSA-vmh5-mc38-953g, GHSA-p88m-4jfj-68fv, GHSA-pr7r-676h-xcf6, GHSA-35p6-xmwp-9g52, GHSA-g8m3-5g58-fq7m | Transitive (jsdom/vitest) |
form-data |
<4.0.6 → 4.0.6 | High | GHSA-hmw2-7cc7-3qxx | Transitive |
esbuild |
<0.28.1 → patched | Low | GHSA-g7r4-m6w7-qqqr | Transitive (vite) |
Result: pnpm audit → 0 vulnerabilities (was 22: 12 high / 4 moderate / 6 low).
2. Frontend non-security bumps (latest within-major)
All remaining deps were lifted to their newest compatible release so the manifest
floors match the regenerated lockfile. Notable: @tanstack/react-query 5.99.2→5.101.2,
react/react-dom 19.2.6→19.2.7, react-hook-form 7.73.1→7.80.0, zod 4.3.6→4.4.3,
@radix-ui/* (14 pkgs), eslint 10.4.0→10.6.0, @typescript-eslint/* 8.59.4→8.62.1,
vitest/@vitest/coverage-v8 4.1.6→4.1.9, msw 2.13.4→2.14.6, @playwright/test
1.59.1→1.61.1, prettier 3.8.3→3.9.3, tailwindcss 4.3.0→4.3.2. No major upgrades; no API breakage.
3. Root tooling (pnpm-only) — security + structural
| Change | Why |
|---|---|
markdown-it → ^14.2.0 (via override) |
GHSA-6v5v-wf23-fmfq (quadratic DoS). markdownlint-cli2@0.22.1 pins it to exactly 14.1.1, so a pinned override is required. |
js-yaml → ^4.3.0 (via override) |
Quadratic-complexity DoS in merge-key handling (patched ≥4.1.2). |
linkify-it → 5.0.1 |
GHSA quadratic scan-loop (High); patched by lockfile regen. |
Overrides moved to pnpm-workspace.yaml |
pnpm 11 no longer reads pnpm.overrides from package.json. |
Removed package-lock.json |
Repo is pnpm-managed (packageManager: pnpm@11.1.3; every workflow uses pnpm). The legacy npm lockfile contradicted the toolchain, could not be cleanly maintained by npm (overrides silently ignored, divergent tree, introduced a new picomatch vuln), and Dependabot was raising alerts against it. pnpm-lock.yaml is now the single source of truth. (User-approved decision.) |
prettier 3.8.3 → 3.9.1 floor |
Latest compatible. |
Result: root pnpm audit → 0 vulnerabilities (was 3: 1 high / 2 moderate).
4. Backend (Cargo)
| Package | From → To | Why |
|---|---|---|
openssl |
0.10.79 → 0.10.81 | RUSTSEC / GHSA-phqj-4mhp-q6mq (transitive). |
openssl-sys |
0.9.115 → 0.9.117 | Follows openssl. |
| ~60 transitive crates | latest semver-compatible | cargo update (e.g. axum 0.8.8→0.8.9, hyper 1.8.1→1.10.1, chrono 0.4.43→0.4.45, clap 4.5.58→4.6.1, reqwest/h2/tokio stack). No manifest constraint changes needed. |
utoipa-swagger-ui |
+vendored feature |
Pre-existing working-tree change incorporated — bundles Swagger UI assets for offline/air-gapped serving. |
Result: cargo audit → 0 vulnerabilities (exit 0). 4 informational unmaintained-crate
warnings remain (bincode, number_prefix, proc-macro-error2) — all transitive, not CVEs,
non-blocking; would require upstream major upgrades to clear.
5. Verification (mirrors CI gates)
| Gate | Command | Result |
|---|---|---|
| Backend format | cargo fmt --all -- --check |
✅ |
| Backend lint | cargo clippy --all-targets --all-features -- -D warnings -D clippy::all |
✅ |
| Backend tests | cargo test --all-features (SQLite) |
✅ all green |
| Backend audit | cargo audit |
✅ exit 0 |
| Frontend install | pnpm install --frozen-lockfile |
✅ reproduces lockfile |
| Frontend lint | pnpm run lint |
✅ |
| Frontend types | pnpm run type-check |
✅ |
| Frontend tests | pnpm test -- --run |
✅ 897 passed, 6 skipped |
| Frontend build | pnpm run build |
✅ |
Note: Postgres integration tests and the release/Docker builds run in CI; release build was not run locally (all changes are semver-compatible patch/minor bumps already compiled by the clippy
--all-targetsgate).
Move docs/planning/autonomous-remediation into docs/.archives/2026/06/remediation following archive conventions (UPPERCASE-WITH-DASHES filenames, topic-named category folder). Fix internal cross-links, update the domain-events.md reference to the new archived path, and link the new section from the archives README. Trim project CLAUDE.md from 1064 to 197 lines by removing Claude Flow V3, swarm-execution, and instruction-reminder content already covered by the global ~/.claude/CLAUDE.md; keep project-specific guidance and a slim AQE config block. Also remove the stale CLAUDE.md.pre-ruflo backup.
Autonomous Remediation — Integration PR (
develop→main)This PR integrates the complete autonomous PR remediation subsystem, built and gated phase-by-phase on
develop. Every phase landed via its own PR with remote CI green as the merge gate (nomainmerges were performed autonomously — this PR is the single human decision point).Phases (all gated, CI-green at merge)
RemediationCapablesupertrait + caps + provider impls + Mock0066105277a785--no-ff), TOCTOU verification, sandbox runner trait, Apalis sweep/run jobs1061529d2d370cbb5be77learning_signal+ provider-order bias), feature-flagged vector reflexion memory, provider parity fallbacks, repo fingerprinting65fbd5e944e2fdQuality signals
cargo fmt --check,clippy --all-targets --all-features -D warnings, workspace build, backend tests (Postgres + SQLite), frontend build/lint,cargo audit, i18n type-parity (27 locales). 875+ backend tests green on Postgres.endpoint_url. Air-gapped orgs block External egress. No force-push primitive. External content framed as DATA, never instructions.ADRs realized
ADR-002 (RemediationCapable supertrait), 003 (Podman/worktree sandbox), 004 (state-machine persistence), 005 (octopus merge), 006 (playbook format), 007 (ModelProvider trait), 008 (model credentials), 009 (model v1 scope), 010 (CI TOCTOU guard), 011 (SSE live updates), 012 (failure classification), 013 (async-trait strategy), 014 (air-gapped governance).
DEFERRED — must be wired before enabling the live agentic auto-merge path
These are intentionally stubbed/feature-gated for safety; the mechanical path is complete and tested, the live agentic path is not yet production-wired:
AgentWorktree/CiVerifierreal container invocation injobs/remediation_run.rs(currently Fake runner in tests; Podman path#[allow(dead_code)]).select_accounttenancy — resolve run → repository owner/org before account selection (currently picks first enabled account; guarded, not yet org-scoped).tools.allowedenforcement at edit-apply time (clamp computed; enforcement lands with sandbox wiring).VectorReflexionMemoryproduction construction (behindreflexionfeature, OFF by default; only Noop/InMemory wired).onnxfeature (OFF in CI; heuristic+Unknown fallback validated).🤖 Generated with Claude Code