Skip to content

v0.4.0 — release hardening for wireframes + embedded agents#50

Merged
kurtstohrer merged 47 commits into
mainfrom
feat/embedded-agents
Jun 22, 2026
Merged

v0.4.0 — release hardening for wireframes + embedded agents#50
kurtstohrer merged 47 commits into
mainfrom
feat/embedded-agents

Conversation

@kurtstohrer

Copy link
Copy Markdown
Owner

Cuts v0.4.0. Hardens the two flagship features for release, scopes the wireframe surface down to what's ready, sweeps the docs for accuracy, and clears the dependency advisories.

Merging triggers release.ymlpublish.yml (tag + npm publish) and re-scans Dependabot on main (the dashboard's 42 alerts are from these same deps, fixed on this branch).

Fixes — apply/undo lifecycle + embedded agents

  • Lifecycle no longer wedges. A crash, abort, restart, orphan reconcile, needs_info, or denied used to strand the snapshot batch as running + entries as applying — disabling undo, discard, AND re-apply at once and locking the wireframe canvas at building forever. Every terminal transition (HTTP + server-side orphan/boot sweep) now routes through one shared closure that seals the batch, releases entries, and unlocks the canvas.
  • Byte-exact undo across the agent's whole footprint (git projects): git stash create baseline + git show auto-extend of every touched file; agent-deleted files are recreated. Non-git keeps anchor-only coverage.
  • Anchorless block no longer crashes the apply; retry re-verifies; needs_info answers reach the agent on resume; cross-tab 409 is a benign no-op; wireframe.json migration shim guards against silent wipes; dead BudgetCap removed; loud non-loopback startup warning.

Wireframe scope-down

  • Data binding + explode gated off (src/shell/wireframeFeatures.ts) while the binding shape-confidence ladder is reworked; existing persisted bindings render read-only. Capture/sketch/directions + the implement loop are unchanged.

Docs + security + CI

  • SECURITY.md states the out-of-box posture (no permission ceiling by default); README gains a Security section.
  • Docs-accuracy sweep: removed dangling refs to deleted modules/routes, fixed counts (28 MCP tools, 62 CSS vars), re-synced AGENTS.md.
  • New secret-gated, schedule/dispatch-only live-cli CI job exercising the real apply→write→verify loop.

Dependencies

  • 47 → 1 advisory (1 critical + all 13 high + all 24 moderate cleared). Bumped the one shipped dep (ws); pnpm.overrides for transitive/dev/playground vulns (in-major caps); astro playground 5→6. The 1 remaining low is esbuild in build tooling only (not shipped; forcing the patch breaks the shell build).

Verification

Build ✓ · typecheck ✓ · 999 unit tests ✓ · CI skill-drift gate ✓. New tests cover the git auto-extend, deleted-file recreation, anchorless apply, cross-tab 409, and wireframe migration.

🤖 Generated with Claude Code

kurtstohrer and others added 30 commits May 12, 2026 17:00
Storage and guardrail layer for the embedded chat (ANN-8 / M4). Owner:
AICoder. Designed so the DesignEngineer settings-sheet pass can bind to
zod-validated shapes without churn.

New modules:
- src/embedded/provider-config.ts — zod schema and helpers for five
  providers (Anthropic, OpenAI/Codex, OpenAI-compatible, Copilot,
  Paperclip), per-conversation USD cap, redaction and event-log toggles,
  plus structured validation and safe-logging redactor.
- src/embedded/redaction.ts — secret-pattern scrubber for prompts and
  tool results. Catches provider keys, GitHub PATs, AWS credentials,
  JWTs, PEM blocks, and env-style assignments. Idempotent; returns match
  offsets for telemetry without leaking redacted bytes.
- src/embedded/budget-cap.ts — per-conversation BudgetCap with injectable
  pricer, sticky shouldStop() once breached, and BudgetCapExceeded
  exception path. pricerFromRateCard helper matches Anthropic's 10%
  cache-read / 125% cache-write convention.
- src/embedded/event-log.ts — local-only ring buffer for provider turn,
  tool_call, tool_result, and error events. Bounded capacity, optional
  persistence sink, subscribe() for UI live updates, totalsByConversation
  for the composer's token meter. No fetch surface.

Shell surface:
- src/shell/composables/useProviderSettings.ts — reactive singleton
  backed by localStorage, exposes makeConversationBudget() and
  usageForConversation() for the composer cap chip + cost meter.
- src/shell/components/ProviderSettingsPanel.vue — functional starter
  panel. DesignEngineer owns the visual polish; field names map 1:1 to
  the schema so renames here do not silently break storage.

Tests: 49 new tests across redaction, budget-cap, event-log,
provider-config, and useProviderSettings. All 116 vitest tests pass.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
Mount AISettingsPanel and ProviderSettingsPanel inside a sidebar-tabbed
Settings overlay, with Providers placed between AI and Appearance per
the M4 sheet UX brief.

Visual polish on ProviderSettingsPanel:
- Flat horizontal provider pill strip, same affordance as the AI tab's
  model picker — single screen with branch selection, not a vertical list.
- Visual parity with AISettingsPanel via a shared _ai-settings.css module
  (`.ai-section`, `.ai-input-row`, `.ai-validation`, etc.). AISettingsPanel's
  scoped style block is removed; both panels now read the same tokens.
- Inline per-field validation: errors surface beneath the originating field
  with a danger-toned input border, replacing the bottom-of-panel list.
  Validation runs reactively on every edit and on provider switch.
- Per-provider empty states: Copilot "Sign in to GitHub Copilot" (disabled
  until ANN-16 ships device-code OAuth) and Paperclip "Connect Paperclip
  company" CTA.
- Readiness chip reflects `isActiveProviderReady` so users see at a glance
  whether the active branch can run.
- Reveal toggles, focus-visible outlines, and `aria-selected` on pills.

Layout:
- Sidebar tabs on desktop (≥640px); the strip collapses to a horizontal
  pill row on mobile.
- Lead copy on AI tab points users to Providers for non-Anthropic keys so
  the relationship between the two panels is discoverable.

Carries forward AISettingsPanel.vue + useAIConfig.ts + pricing.ts that
ANN-4 left untracked in the workspace; useProviderSettings already imports
from pricing.ts, so without these the prior M4 commit didn't build.

Verified visually with Playwright at 1440x900 and 390x844:
- Each provider tab renders the right branch with inline error states.
- Settings persist across reload (`localStorage` key
  `annotask:ai:providerSettings`).
- `pnpm exec vitest run` — 367 pass / 2 skipped.
- `vue-tsc --noEmit -p src/shell/tsconfig.json` clean.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
…wizard

Bring the embedded agent surface online end-to-end:

Providers
- Add CLI-local provider family (claude-local, codex-local, opencode-local,
  copilot-local) that reuses the user's existing CLI logins via a spawn
  endpoint — no API keys stored in the browser.
- Add OpenRouter provider and a generic OpenAI-compatible transport.
- Remove the GitHub Copilot BYOK/HTTP provider; copilot-local (CLI
  subprocess) is now the only Copilot path. parseProviderSettings migrates
  any persisted activeProvider='copilot' to 'copilot-local'.
- Provider factory dispatches to the active provider with credentials from
  the persisted settings blob.
- Live model-catalog discovery per provider with TTL cache, dedup, and an
  explicit "Cannot fetch models" blocked-state for providers with no
  headless model list.

Personas + per-agent overrides
- Persona resolver maps task types to personas; overrides live in
  .annotask/agents.json (provider, model, effort, projectDirections).
- Settings → Agents shows AgentDirectionsPanel with per-persona runtime
  config; switching providers auto-clears stale model ids so the picker
  always shows a valid catalog or defaults to Auto.

Init wizard + project state
- Multi-step InitWizard (agent/scan/review) drives first-run setup and
  re-init. Server-side init pipeline writes .annotask/design-spec.json,
  components, agent directions, and per-persona templates.
- WebSocket init:progress events keep the UI in sync with the running
  pipeline.

In-shell chat
- Conversation tab + composer with auto-run, working indicator, and a
  message/stream tokenizer.
- useTaskThread / useEmbeddedAgent composables wire the provider stream
  into the task lifecycle (transitions, agent_feedback, usage ledger).

Token usage + guardrails
- .annotask/usage.jsonl ledger aggregated per scope (task/init/apply/chat)
  and per provider, surfaced in Settings → Agents.
- Redaction toggle + local-only event log; budget cap enforced per turn.

UI cleanup
- Tabbed SettingsOverlay (Agents / Project / Appearance).
- ProviderSettingsPanel: flat pill switcher, live model picker with
  blocked-state warning, local-CLI detect banner, billing note for
  claude-local on Pro/Max plans.
- Drop AISettingsPanel and useAIConfig (superseded by ProviderSettingsPanel
  + useProviderSettings).
Marketing landing page as an AI would first-generate it — correct structure
and real content, but with slightly off copy, unnecessary sections (pricing,
trusted-by, newsletter), weak styling (purple accent, 4px border-radius,
no light mode, no responsive breakpoints), a11y gaps (low contrast, empty
alt, missing main landmark, heading skip), and no syntax highlighting in
code blocks. This is the "before" snapshot for the demo video.
Fixes tagline ("applies the change"), CTA ("View on GitHub"), adds <main>
landmark, restores light mode, responsive breakpoints, 12px card radius,
blue accent (#3b82f6), syntax highlighting, styled lifecycle badges,
dogfood banner. Removes pricing/trusted-by/newsletter sections. This is
the "after" snapshot for the demo video.
- scripts/demo-reset.sh: restores marketing page to AI first-draft state
  from the demo/marketing-before tag, clears tasks and sidecars
- scripts/demo-restore.sh: restores polished state from HEAD
- demo/transcript.md: 12-segment voiceover + action cues for ~4:25 hero video
- package.json: adds demo:reset and demo:restore npm scripts
- .gitignore: excludes demo recording artifacts (segments, voiceover, final)
- e2e/screenshots.test.ts: 12 Playwright tests capturing annotask shell
  features (pins, arrows, sections, highlights, tokens, inspector, etc.)
- e2e/demo-record.test.ts + config: automated video recording of demo
  segments (before scroll, setup, style editor, a11y scan, tokens, result)
- docs/media/screenshots/: 11 generated feature screenshots
- playgrounds/simple/marketing/public/screenshots/: copies for feature cards
- demo/segments/: recorded webm+mp4 clips (gitignored)
- demo/final/annotask-demo.mp4: assembled 74s demo video (gitignored)
…rd recording

- Screenshots now captured against marketing playground (port 5181) instead
  of vue-vite playground — feature cards show marketing page content in shell
- Hero screenshot (shell-overview.png) shows marketing page in dark mode with
  correct "Agent applies the change" tagline
- Init wizard recording now properly clicks through agent selection → scan,
  captures framework detection checkmarks and agent output
- playwright.config.ts: screenshots project points to port 5181
- e2e/screenshots.test.ts: saves to both docs/media and marketing public dirs
Screenshots now show the annotask shell over the Solar System Explorer
(vue-vite playground) instead of the marketing page. The planet orbital
canvas and card grid are more compelling showcase content. Shell overview
shows the /planets route with the 8-planet card grid.
Prompt file for a new session to record the init wizard deep-dive using
testreel (programmatic recordPage API for iframe support) and edge-tts
(AndrewNeural voice). Includes 10-segment outline covering all wizard
steps: agent selection, scan progress, and 6 review sub-steps (framework,
tokens, components, APIs, style guide, agent directions).
- Agent selection: cycle through Codex → OpenCode → Claude to show options
- Scan: record video first, write voiceover after to match what's on screen
- Style guide: scroll through content, show edit mode, mention loading
  existing docs from repo
- Agent directions: go deeper — show per-persona cards, change one agent's
  provider to a different LLM, explain task-type routing
- Save: don't say "shell", say "your project is ready for Annotask"
- All voiceover drafts marked as starting points to rewrite after recording
- Workflow section: video first → watch → rewrite scripts → generate audio
…ness, and live test coverage

Hardening + verification on top of the embedded-agents feature (run local AI CLIs
— claude, codex, opencode, copilot — inside Annotask to apply design tasks).

Security defaults & enforcement:
- Default permission mode is now Auto (per-CLI least-permissive headless mode via
  normalizeHeadlessMode) instead of blanket bypass: codex stays --full-auto
  sandboxed, copilot minimal, claude/opencode escalate to bypass. bypass is now an
  explicit opt-in; removed the default->bypass migration.
- Server-side ANNOTASK_MAX_PERMISSION clamp re-derives the level from argv and
  refuses over-cap spawns (per-task and init paths).
- Wire redactValue into tool_use inputs; add Google/Stripe/connection-string
  redaction patterns. Honest per-CLI plan-mode labels.

Robustness:
- agent-spawn: stdin EPIPE crash guard, concurrency cap, and PWD=cwd so CLIs that
  read $PWD (opencode) resolve the spawn workspace, not the launch dir.
- init: codex --skip-git-repo-check so it runs in non-git projects; extracted
  buildInitCliInvocation for testability. Surface usage-ledger write failures.
  SSE reconnect give-up + retry. Quote-aware extra-args tokenizer. Copilot
  input-token forward-compat. De-dup NATIVE_PLAN_PROVIDERS.

Tests:
- Hermetic: permission-flag matrix, provider apply-arg + clamp matrix, redaction
  wiring, abort propagation, SSE reconnect, init arg builder.
- Live (ANNOTASK_LIVE_CLI=1): new apply-cli-matrix e2e — all four CLIs apply a
  real style_update to the marketing playground and advance the task to review.
Phase 3 architecture work: move run coordination + task finalization to the
server, and fix the conversation-store flush cost.

task-thread O(n^2) -> O(1) flush:
- The store now mirrors each task's thread in memory (it is the sole writer;
  external agents only read/tail). Streaming a partial turn no longer re-reads,
  re-parses and rewrites the whole JSONL on every throttled flush — the common
  'update the last (streaming) line' case does a tail-rewrite; earlier-message
  edits fall back to an atomic full rewrite.

Server-side run registry keyed by task id (cross-tab dedup):
- Thread taskId from useEmbeddedAgent -> StreamOptions -> the spawn POST body.
- agent-spawn keeps a byTask map and refuses a second spawn for a task already
  running (409), closing the cross-tab double-spawn hole the per-tab client
  guard can't (two tabs auto-running one task would fork two CLIs on the same
  files). Added a spawnImpl test seam.

Orphaned-task finalization:
- The registry reports every run end via onRunEnd; index.ts grace-checks
  (~12s) and, if the task is still in_progress (client never transitioned it,
  e.g. the tab closed mid-run), marks it blocked instead of leaving it stuck.
  Race-safe: a normal completion's client review PATCH lands within the grace.

Tests: task-thread streaming durability + non-last fallback; agent-spawn
dedup/orphan-hook/taskId validation via a fake child; live apply-matrix now
threads taskId so the registry path is exercised end to end.
…rounding, wireframe lifecycle, docs sweep

Acts on a full architecture/product/security/agent review. Highlights:

Reproduced breakers
- transform: Svelte attribute-expression corruption (brace tracking) and
  JSX/TSX statement-scope string/comment/regex corruption (scope stack).
- webpack: devServer proxy was dead under webpack-dev-server v5; inject
  synchronously in apply() with v4/v5 forms + loud manual-config fallback.

Embedded agents — all four CLIs first-class
- codex/opencode/copilot now carry multi-turn history (rollupHistoryAsPrompt,
  oldest-first truncation under argv budget); claude keeps its stdin path.
- copilot included in every seed-gate list/message.
- seed-run error exits revert the task to pending + clear the run indicator
  (no more stuck in_progress); process-group kill so grandchildren can't orphan.

Security
- Host gating on all /__annotask routes (DNS-rebinding fix) with allowedHosts
  / bind-host / ANNOTASK_ALLOWED_HOSTS escape hatches.
- permission-cap synonym holes closed (bypassPermissions / danger-full-access /
  -a never); bridge postMessage origin validation (no app data posts to '*');
  draft-edits path containment; .annotask/ auto-gitignored; tasks.json quarantine
  on corrupt boot; transcript cleanup on accept/delete; transition TOCTOU fix.

Shell silent-failure pattern
- auto-run busy-loop hang fixed; res.ok checks + lastError; error-stack URL→repo
  path normalization (error_fix creation works again); pending draft survives
  failures; WS reconnect resyncs tasks; tab-order overlay rate-capped.

Agent grounding
- new MCP tools: get_source_excerpt, get_playbook, get_agent_directions (+CLI);
  get_tasks detail strips 200KB sidecars + paginates; structured --mcp errors;
  CLI runCli() test seam.

Wireframe Milestone 1 — the loop closes
- instance status/taskId/previewProps; idempotent Build; accepting a
  wireframe_apply task removes its placements server-side (delete reverts) —
  no duplicate renders during/after review; reapply mounts previewProps,
  isolates per-instance failures, surfaces stale anchors; placements panel with
  delete; rev-based 409 conflict on PUT; drop pipeline extracted to
  useWireframeCanvas.ts.

Docs/contract
- api_update purged (13 locations); wireframe_apply documented everywhere;
  missing routes/WS events/MCP tools added; PERF_FIX enums corrected; embedded
  layer added to architecture; SECURITY threat model; TODO.md removed (done);
  scripts/sync-skills.mjs + CI drift check keep skill mirrors in sync.

Verification: typecheck clean; 775 unit tests pass (+184); build clean; live
streaming + apply matrix pass 4/4 (claude, codex, opencode, copilot).
…n tool

Self-contained kickoff for the next phase: end-user live edits (props/text/
styles/move/resize/structure) + instant faithful previews that persist to real
source, with undo/discard. Captures M1-done state, the live-commit vs
preview-only architecture decision (recommends live-commit via the draft
engine), and milestones M2-M7 with acceptance criteria.
…-apply loop

The server half of the snapshot-wireframe apply loop. A design session is an
ordered journal of user edits (.annotask/design-session.json, CAS on rev);
'Apply now' snapshots every touched file (byte-exact undo/discard net), mints
ONE wireframe_apply task carrying context.wireframe + context.session, and
stamps statuses. When the agent flips the task to review, a verification pass
re-reads source per entry and marks each written|failed. Accept commits the
snapshots; delete releases entries back to pending. The file-snapshot engine
replaces draft-edits (one-time journal migration included), is hash-guarded
against user edits, and rehydrates across restarts — files revert only via
explicit undo/discard. binding-classify stays as agent tooling (MCP + CLI):
round-trip honesty classification of props/text before an agent rewrites them.
…placement identity

The shell half of the apply loop. useStyleEditor becomes a facade over the
useDesignSession journal (changes is a collapsed projection; the frozen
report/watch contract is unchanged); the DesignSessionPanel drives Apply now /
Undo last apply / Discard with per-entry status chips. Placements gain
durable identity end-to-end: clicking a mounted placement selects the
placement (instance_id), not its mounted internals, and the inspector
declines style edits on placements (no source anchor to snapshot or verify).
The project palette pins the Project library first, with sample-prop
preview widgets extracted to propWidgets. The design-apply e2e seeds the
journal via the API and proves the panel -> task -> snapshot -> release loop;
the react-vite /planets playground page backs capture/canvas e2e to come.
CLAUDE.md and the apply skill describe the journal-backed apply loop;
component_prop_update/text_update become documented legacy types (agents
still apply them from older tasks, the UI no longer emits them). The todo
handoff records the pivot: strip the live-edit surface, build snapshot
wireframing on the kept apply loop. Skill mirrors synced.
Wireframe mode freezes the current route into a manipulable image canvas.
A new wireframe:capture bridge message walks the rendered DOM (semantic
main + single-child unwrap; header/nav/footer/aside chrome pass; 24-block
cap), rasterizes each block sequentially with html2canvas (progress pushes,
per-block failure stays an honest hatched box), and ends with a scale-1
full-document pass — the 'before' truth for apply-time composites. Block
PNGs land in .annotask/wireframe-snapshots/ (id-addressed upload, contained
serve/delete); wireframe.json gains a per-route canvas node (blocks carry
source anchors + original rects; validated to block depth at the PUT
boundary). The canvas renders as an opaque overlay INSIDE #canvas-area —
the live iframe stays mounted underneath, so exit is lossless by
construction (e2e proves zero reloads and a PlanetsPage.vue:line anchor
chip on /planets).
W2 — the sketch becomes freely manipulable: drag (3px threshold, z-bump
on grab), 8-handle resize, soft-delete for captured blocks (the apply
diff needs deletion as a fact; an undelete popover restores them), hard
delete for sketch material with refcount-guarded snapshot cleanup,
Ctrl+D duplicate (shares the image file; duplicateOf roots at the
original), per-block notes (the user-said channel), palette drops as
honest preview:component snapshots (canvas-only — no live mount; a
data-bound component degrades visibly to a placeholder render), and a
drawn labeled placeholder tool. Mutations persist with a trailing
debounce; F5 restores the canvas as left. Two real bugs fixed en route:
palette-drop props were Vue reactive proxies postMessage refused
(DataCloneError — every drop silently degraded to placeholder), and
block discovery unwraps single-child wrappers by HEIGHT, not area, so a
centered max-width page column still yields per-section blocks at any
viewport. Captured blocks carry their first class name — the only
human-distinct label when every block shares one owning component.

W3 — 'Implement this wireframe' closes the loop: a pure diff engine
turns the sketch-vs-capture delta into ONE anchored direction per
changed block (move/resize/delete/add/note) where relational facts
computed from box geometry ('now above the filters (was below)') are
the contract and pixels are explicitly hints; tool-measured geometry
and the user's verbatim notes ride separate schema channels. Adds
anchor at the nearest surviving captured neighbor. A labeled
before/after composite (numbered badges matching the direction list)
rides the existing screenshot pipeline. Directions are journal entries,
so the apply loop is unchanged: route-scoped collection, file snapshots
before the agent runs, ONE wireframe_apply task, byte-exact undo. The
verifier trusts directions with no source read — spatial outcomes
verify visually, so resolutions must be specific (playbook rewritten).
The canvas locks to the task while building; delete unlocks the sketch
for tweak-and-re-implement; accept removes it and shows the real
implementation.

Proven live: the apply-session matrix seeds the /planets rearrangement
(grid above filters + pagination placeholder + note) and a real claude
CLI implemented it in source, trust-verify flipped all directions
written, and revertApplyBatch restored the bytes exactly.
…created-file netting

Canvas: double-click (or the header button) EXPLODES a captured block one
level deeper — the durable anchor re-resolves the live element, a rootEid
capture rasterizes its children (no redundant full-page pass), and the
children land translated by the block's canvas delta with their own
file:line anchors and live-position diff baselines. Dragging snaps
edges/centers to neighbors with visible alignment guides (Alt disables;
pure computeSnap helper). Marquee + shift-click multi-select with group
drag, group delete/duplicate, and arrow-key nudging (Shift = 10px). The
chrome bar shows the capture viewport — pick a device preset before
capturing to wireframe mobile. Captured-block labels stay distinct via
cssClass, and the two vue wireframe e2e suites now live in ONE file:
Playwright runs separate spec files in parallel workers, and they share
one dev server's wireframe document.

Server: boot-time GC sweeps .annotask/wireframe-snapshots/ for PNGs no
doc references (1h mtime guard; a malformed wireframe.json aborts the
sweep — an unreadable doc must never read as 'no references'). In git
projects, apply batches record the untracked set at snapshot time and
the seal nets agent-CREATED files into the batch (sha256 per file) —
undo/discard delete pristine copies and keep+report user-edited ones;
non-git projects degrade silently.
…drop

Double-clicking a block visually 'removed the styling': explode deleted
the parent block, and the children only carry their own pixels — the
container's background, padding, and the surface BETWEEN children (all
painted by the parent element) vanished, leaving naked fields floating
on the canvas grid.

The parent now survives as a SHELL backdrop: the rootEid capture takes
one extra pass of the root's rect with the captured child blocks
visibility-hidden in the html2canvas clone — the container's own pixels
with no ghost children, so dragging a child out reveals clean surface,
never a burned-in copy of itself. The shell uploads under a new file id
(duplicates keep referencing the original pixels), is marked shell:true
(not explodable again; no explode affordance), and stacks below the
children. When no shell can be captured the parent is removed as before
— a stale full image would ghost. Playbook notes that container vs
child directions mean whole-container moves vs restructuring inside it.
Wireframe mode is the only move-things surface now. Strips the tool across
shell (useRepositionMode deleted; useWireframeCanvas pointer handlers, App.vue
shield, toolbar buttons, 'reposition' interaction mode + 'r' key), bridge
(resolve:move-source + move:element handlers, ResolveMoveSourceResult/
MoveElementPayload types), journal (recordMove/MoveChangeRecord; the
component_move report arm), and theme (--mode-reposition: 64 -> 63 vars,
making CLAUDE.md's documented 63-var contract true; saved custom themes
tolerate the stale key by construction).

Kept shared dependencies the tool only rode on: dropPositionFor + the drop
indicator (live palette drop), data-annotask-instance stamping (selection,
drop-target refusal, capture exclusion, reapply idempotency), and the
containerEids map — hoisted into useWireframeCanvas because deletePlacement
unmounts live nodes through it and reapply repopulates it after F5/route
change. ComponentMoveChange/component_move stays in the schema + SKILL.md as
a documented legacy type (now with its previously-missing apply rule), no
longer emitted.
The offscreen preview container and the html2canvas backgroundColor were
hardcoded to a white card (#ffffff/#111111), which stuck out badly on dark
apps. resolveAppSurface() reads the computed background the app actually
paints (body first — content sits on it — then html; the opposite order of
detectColorScheme's viewport probe, which is unchanged) and derives a
contrast-safe text color from its BT.709 luminance, falling back to a
scheme-derived pair when both are transparent. readBg() is hoisted out of
detectColorScheme so both share it.

Fixes every existing consumer for free: wireframe palette drops and the
Components page thumbnails. Verified on the dark vue-vite playground — a
dropped Header snapshot now reads rgb(18,18,18) at every padding corner
(the app's #121212), confirmed by eye on the PNG.
…icker

WireframeDataBinding (shared/wireframe-types.ts): REAL scanner-catalog
identity (kind/name/module/endpoint) + user drill-down (path like
"planets[]", fields) + the shape_source honesty tag ('api-schema' |
'source-details' | 'none'). WireframeBlock gains an optional kind-agnostic
data field; isWireframeBlock rejects malformed bindings at the PUT boundary.

GET /__annotask/api/data-source-shape?name&kind&file resolves a catalog
entry down the honesty ladder (data-source-shape.ts): (1) the entry's
endpoint matched against discovered API schemas via resolveEndpoint — the
already-deref'd response_schema walks into a DataShapeNode tree
(schemaToShape: cycle  and GraphQL $type degrade to named ref leaves,
allOf merges, oneOf/anyOf first object-ish variant, depth-capped);
(2) regex-inferred return-type hints verbatim with their confidence — never
expanded into a tree; (3) honestly nothing. Runtime-promoted entries are
excluded (no code identity, no shapes this round). Shell-only — agents
re-ground via the existing data-source-details / api-operation tools.

DataBindingPicker.vue (ships unmounted until D4/D5): catalog search →
source select → expandable shape tree with path pick + field checkboxes
under 'api-schema'; verbatim hints + free-text under 'source-details';
visibly-blind free-text under 'none'. Pure logic in utils/bindingShape.ts
(flattenShape/fieldCandidates/buildBinding, validated before emit).

Tests: shared validator depth (wireframe-types.test.ts), walker + ladder
against an OpenAPI fixture incl. cycle refs and ambiguity (offline via
explicit apiSchemaFiles), flatten/candidates/build units.
…tte drop

Dropping or clicking a component in wireframe mode now opens a right-docked
Generate panel instead of instantly minting a block: settings (real props vs
preview samples via the propWidgets widget table), an optional data binding
through the D3 picker (offered automatically when the component looks
data-driven, never required), generate (honest preview:component snapshot on
the app-true surface, fidelity pill, one-click regenerate), then place — at
the remembered drop point (the drag fast path) or by ghost-placing on the
canvas (image rides the cursor; Escape backs out; clicks over existing
blocks place instead of dragging). A gear button on placed palette blocks
reopens the panel seeded from the block for in-place reconfiguration
(props/binding/snapshot update under the same block id); bound blocks show
a data chip.

useWireframeMode grows addPaletteBlock/updatePaletteBlock (the old drop path
now routes through them, behavior-identical); useComponentGenerator owns the
session state machine with JSON-round-trip clone discipline at every
postMessage/persist boundary. html/layout-preset drops keep the instant
placeholder path; the live-iframe drop path is untouched.

e2e: W2's drop section drives the panel flow; a new spec binds usePlanets
through the real api-schema shape tree (planets[] -> name, type), survives
F5, and regenerates in place. Unit: generator state machine incl.
clone-at-boundary and edit-mode routing.
…tate draw tool removed

The wireframe placeholder grows into the section concept (one concept, not
two): a drawn box keeps its cheap label-only flow, and gains an on-demand
markdown spec (Write/Preview popover via safeMd; Ctrl+Enter or Save commits,
plain Enter stays a newline) and a data binding through the D3 picker.
"Section" is a DERIVED affordance — a placeholder with md or data renders
the section tag, the clipped md hint, and the binding chip; the validator
gains the additive md check. setBlockMd/setBlockData persist per-route in
wireframe.json, so sections survive F5 (proven: 12/12 browser smoke incl.
the api-schema drill to planets[] name+type).

The Annotate tab's drawn-section tool is stripped per the live-edit/D1
pattern: DrawnSectionOverlay deleted; DrawnSection state, the draw-rect
plumbing, onSectionSubmit (the ONLY section_request emitter), the
restore/eid-resolution/rect-tracking section branches, the 'draw' interaction
mode + 'd' key (stored value degrades to 'interact'), the toolbar button,
and the --mode-draw theme variable (62-var contract; saved custom themes
tolerate the stale key by construction). Legacy pending section_request
tasks stay visible and agent-applicable in the Tasks panel — with the tool
gone there is no overlay claiming editability. section_request stays in
TASK_TYPES + SKILL.md as a documented legacy type, now explicitly marked.
…live proof

WireframeDirectionChange.added gains md (the user's VERBATIM markdown spec
for a drawn section — never blended with measured geometry; the description
quotes only its first line) and data (the WireframeDataBinding, cloned plain).
computeWireframeDirections emits both for placeholder AND palette adds with
an honest description summary ('bind to the composable usePlanets →
planets[] (show name, type) [shape: api-schema]'); bare-placeholder output
is regression-locked byte-stable. The before/after composite renders the
section tag, md first line, and binding identity as text in the dashed box —
never fabricated pixels (verified by eye on the minted task's PNG).

WIREFRAME_APPLY.md: the add/component and add/placeholder rules split bare
vs with-md sections, and a new 'Data bindings on adds' section spells out
the shape_source honesty ladder and the re-grounding protocol
(annotask_get_data_source_details/_examples always; _api_operation when a
schema ref exists; needs_info on contradiction; never invent fields).
entryForTask passes the change whole — verified by a new apply-session
assertion; directions stay trust-verified.

Live proof (ANNOTASK_LIVE_CLI=1, claude): a third matrix scenario seeds a
fixture project with a REAL usePlanets composable + PlanetCard and an empty
content section (zero planet names anywhere), hands the agent the bound add
direction, and asserts the real import + call + v-for + :name/:type landed,
NO sample data was fabricated, the composable and card stayed byte-identical,
trust-verify flipped written, and undo restored byte-exact source — all 3
live scenarios pass. e2e: a section with md + binding implements into ONE
task carrying both verbatim (6/6 wireframe specs green).
…dversarial pass

Behavior:
- Canvas keydown gains an input-target guard: typing in the generate panel
  (or its embedded picker) no longer triggers block hotkeys — Backspace in a
  prop field was deleting the selected block. mdEditing resets on every
  selection change (select() + the marquee path); it previously stuck after
  the editor unmounted, swallowing all canvas keys or committing the draft
  to the wrong block.
- DataBindingPicker: monotonic pick token — a slow cold-scan response can no
  longer land on a newer selection. Root-object shapes: pickedPath starts
  null (the root's path IS ''), so nothing renders falsely picked and the
  whole-response pick now reaches the fields multi-select.
- Generator: placeAt/apply close the session synchronously before the upload
  await (double-click placed duplicate blocks); generate/place/apply respect
  the building lock and Implement cancels any open session; an edit-mode
  Apply without regenerating keeps the block's existing previewProps (they
  are 'what the user saw rendered', and nothing was rendered).
- Regenerate-in-place uploads under a FRESH filename and refcount-deletes
  the old one: duplicates share the old PNG (overwriting silently changed
  them) and the 1h snapshot cache masked same-name re-uploads after reload.
  imageSrc is strictly own-id keyed; duplicates seed their own liveImages
  entry at duplication time.
- schemaToShape composes allOf with an explicit object type / sibling
  properties instead of swallowing the inherited keys (unit-pinned).
- Directions: a drawn section's description no longer says 'keep it visibly
  a placeholder' (added.md is the contract); a duplicate whose original left
  the sketch says so honestly instead of claiming a user-sketched box.
- applyDesignSession numbers ONLY direction entries (badge order can't drift
  when placements/legacy edits ride the same task) and claims the composite
  screenshot only when one actually rode along.

Strip leftovers and docs:
- stress-test annotate matrix no longer drives the removed tool-draw (it
  failed serially for all 7 apps); marketing page drops the Drawn-sections
  card + orphan screenshot; HelpOverlay/README say 62 CSS variables;
  SKILL.md intro and docs/api.md mark sections/component_move legacy;
  vue-webpack's installed skill copies refreshed; dead 'move' icon removed.

Not fixed (noted): scanApiSchemas' 60s cache ignores its options (pre-existing,
affects rung-1 freshness after an optionless scan warms it); the design-session
PUT validates only the change envelope, so a direct PUT could inject an
unvalidated added.data (every shell writer validates; agents re-ground anyway).
Block discovery had exactly two passes — semantic chrome (header/nav/
footer/aside) and children of the content root (which starts at <main>) —
so content mounted as a SIBLING of main that isn't semantic chrome (a
floating button, banner, or toast in layout flow) appeared in the honest
full-page 'before' image but got no block on the canvas: it couldn't be
moved, deleted, or annotated in the sketch.

A third straggler pass walks the capture-root → content-root path and
blocks every qualifying sibling with a real footprint at each level,
unless the chrome pass already covers it (containing a chrome block counts
as covered — blocking the wrapper would double-capture the header inside).
The playground's bare <Button> after <main> (App.vue) is the canonical
case, asserted in the W1 e2e (by anchor tag — anchor.file alone would
false-pass, the header chrome block also anchors to App.vue via attribute
fallthrough).
The D1-D6 plan this branch implemented — kept as the round's record, like
the snapshot-wireframe handoff before it.
…on removed

Applied end-to-end through the wireframe loop (sketch -> directions ->
embedded agent -> review -> accept): filters, sort, and the search field
regroup into an <aside class="controls"> beside the planet layout
(column-reverse under 900px), and the bare sibling-of-main <Button> stub
in App.vue is gone — the element the straggler-capture fix was proven on.
…tore the straggler fixture

The accepted /planets redesign collapsed the page to two content blocks
(page-header + planets-content wrapper), which broke the suite's old
header/toolbar/grid anatomy assumptions — W4's marquee aliased toolbarBlk
to headerBlk and could no longer select two blocks. The block picks and
expectations now target the two-block anatomy (explode covers the wrapper's
layout/controls children).

The deleted Button stub was also the suite's only sibling-of-main element;
a deliberate, styled 'Back to top' button after <main> replaces it as the
canonical straggler-pass fixture (commented as such in App.vue), keeping
W1's capture-coverage assertion honest. 6/6 wireframe specs green against
the new page.
…frame hero demo

- src/api.js: fetchStats() + fetchChangelog() against the shared FastAPI
  (/api/marketing/*); fetchChangelog is deliberately unused — it is the
  binding target for the hero's drawn 'What's new' section
- hero stats strip renders live installs/stars/contributors/frameworks
- vite proxy for /api and /openapi.json so the schema scanner resolves
  shapes (shape_source: 'api-schema', ChangelogEntry[] at confidence 1)
- justfile: 'just marketing' target with ensure-api
…ession state, keep init

demo:reset:hero restores the marketing playground to the wireframe-hero
'before' state: tracked sources back to HEAD, agent-created strays cleaned,
wireframe/design-session/file-snapshots/conversations/usage cleared, tasks
emptied — while server.json/design-spec.json/agents.json survive so the
InitWizard never hijacks a take. Verified idempotent.
Supersedes the never-recorded 'AI draft → polished page' hero with
'Freeze. Sketch. Real.': freeze the marketing page into the snapshot-wireframe
canvas, rearrange it, bind the drawn 'What's new' section to the real
/api/marketing/changelog schema, and let the embedded agent rewrite the
source — byte-exact undo as the safety net.

- demo_plan.md: suite design (10-segment hero, four directions, clips C1–C8,
  salvage map, VO tone)
- demo/transcript-hero.md: per-segment voiceover + on-screen directions
- demo/HERO_RECORDING_PROMPT.md: recording-session runbook — all selectors
  verified against src/shell + the e2e suite; provider seeding, wait chains,
  speed-ramp, pre-flight, risk register
- MARKETING_DEMO_PROMPT.md: superseded banner (its stage survives in C3/C8)
…dev server

A bare-array tasks.json (hand-written resets, older tooling) parsed fine but
crashed the first addTask() with 'Cannot read properties of undefined' —
taking the whole Vite dev server down. Normalize arrays and missing .tasks
keys at load. demo-reset-hero.sh now writes the canonical shape.
…ripted and proven

One continuous scripted take on the marketing playground: wireframe freeze,
drag/resize/note, drawn 'What's new' section bound to the real changelog
schema, Implement → live claude-local run (ramped 8× with caption) →
safety affordance → accept → reveal. Produces demo/final/annotask-hero.mp4
(140s; media dirs are gitignored).

- demo/lib/record-helpers.mjs: live DOM cursor overlay (testreel's post
  cursor can't track real-mouse drags — its standalone moveCursorToPoint
  feeds a global tracker recordPage never serializes), boot/enter helpers
  ported from the e2e suite, API poll gates
- demo/record-hero.mjs: the take script; DEMO_SMOKE=1 validates every
  selector/gate headless in ~13s without an agent run
- demo/generate-voiceover-hero.sh + demo/assemble-hero.sh: edge-tts VO,
  marker-driven assembly with input-side-seek speed ramp and
  collision-resolved narration placement
- transcript/runbook updated: safety beat moved BEFORE accept (accepting
  clears the session and its snapshot batches — undo only exists in review)
…ning

Land the place-first on-canvas component configuration and harden the
wireframe apply lifecycle ahead of v0.3.0.

Place-first config (replaces the generate-then-place panel):
- Dropping a palette component mints the block instantly; props, data
  binding, and loop repetition are configured in an inline popover with
  debounced live regeneration on the app-true surface.
- New: WireframeBlockPopover, PropWidgetRows, useCanvasHistory,
  useCanvasSelection (+ tests); remove GenerateComponentPanel.
- Rewrite the vue-vite wireframe e2e to drive the new wf-pop-* flow.

Apply-lifecycle data safety (review blockers + crash recovery):
- Gate Undo/Discard while an apply run is in flight; the snapshot engine
  refuses a still-'running' (unsealed) batch and the discard endpoint
  409s on one, so a one-click revert can no longer clobber the agent's
  in-progress bytes.
- (Re-)seal the batch by task on every review (sealBatchByTask), so
  re-apply-after-deny and pure-placement applies get a correct undo
  baseline instead of falsely reporting "edited outside Annotask".
- A crashed/aborted seed run seals its batch 'failed' (keeping it
  revertible) and returns its entries to 'pending' instead of stranding
  them in 'applying'.

Embedded-agent + shell fixes:
- Render model/user markdown through safeMd (DOMPurify) instead of raw
  marked.parse + v-html (XSS hardening).
- Stop the cumulative token counter double-counting across turns.
- plain()-clone instanceProps before postMessage so a nested-object
  binding no longer degrades the live preview to a placeholder
  (DataCloneError).

Re-sync the .claude / .agents / vue-webpack skill mirrors from canonical.
Bump to 0.3.0 — embedded-agent mode and wireframing are additive,
non-breaking features, so a semver minor.

- Restructure CHANGELOG with the real 0.3.0 surface (snapshot wireframe,
  data-binding picker, design-session apply loop, embedded agents, the
  tool-strip pivot) and the data-safety fixes from the release review.
- Document embedded agents + wireframing in the README (it ships in the
  npm tarball; CLAUDE.md alone did not reach users).
These are internal handoff/recording material, not product — and they would
otherwise ride into the public repo and the PR to main. The npm tarball
(files: ["dist","skills"]) never shipped them; this removes them from git too.

- todo/: the 5 wireframe/embedded-agent handoff + kickoff planning docs.
- demo/: recording prompts, transcripts, and the record/assemble/voiceover
  scripts. The gitignored demo/{segments,voiceover,final} binaries are
  local-only and untouched.
- demo_plan.md: the root demo plan doc.

Kept: scripts/demo-*.sh + the demo:* package scripts (functional tooling that
resets the marketing playground off the demo/marketing-before git tag, which
the live apply-cli-matrix test also uses). Dangling doc-comment pointers in
e2e/helpers/design-tool.ts and scripts/demo-reset-hero.sh updated.
…bedded-agent loop

Apply/undo lifecycle no longer wedges. A crash, abort, restart, orphan
reconcile, or an agent that pauses at needs_info / is denied used to strand the
snapshot batch as 'running' and entries as 'applying' — disabling undo, discard
AND re-apply at once and locking the wireframe canvas at 'building' forever.
Every terminal transition (HTTP + the server-side orphan/boot sweep) now routes
through one shared closure (releaseApplyTask: seal batch + release entries +
unlock canvas); applyInFlight keys off the always-sealed batch status; retries
re-stamp entries so review re-verifies.

Undo is byte-exact across the agent's whole footprint. In a git project the
engine captures a pre-apply baseline (git stash create) and, at seal, folds
every tracked file the agent actually touched into the batch (pre-apply bytes
from git show). Agent-deleted files are recreated on undo (they used to vanish
despite a held copy). Non-git projects keep anchor-only coverage.

Other fixes:
- Anchorless wireframe block (file '') no longer crashes the apply after minting
  the task — empty anchors are filtered from the snapshot set.
- needs_info answers are injected into the agent's resume prompt (no-MCP loop).
- Cross-tab 409 (task_already_running) is a benign no-op, not a stream error
  that reverts the winning tab's live run.
- wireframe.json runs through a migration shim before validation, so a future
  schema bump (or a legacy version-less doc) upgrades instead of silently wiping.
- Removed the dead BudgetCap module (USD caps aren't portable across providers;
  the idle/duration watchdogs are the runaway guard).
- The dev server prints a loud warning when bound beyond loopback (vite --host).

Tests added for the git auto-extend, deleted-file recreation, anchorless apply,
and the cross-tab 409 path.
…surface

This release ships the reduced wireframe surface — capture + sketch + the
implement-this-wireframe directions loop — and defers two built-but-not-ready
areas behind flags in src/shell/wireframeFeatures.ts:

- Data binding (palette/section binding picker, shape drill-down, prop→field
  map, loop repeat) — deferred while the shape-confidence honesty ladder is
  reworked.
- Explode-to-children.

The code paths stay in the tree and existing persisted bindings still render
read-only; only the UI entry points are gated (WireframeCanvas, BlockPopover).
The e2e specs that exercise binding/explode are skipped/trimmed to match.

Also: wireframe capture/enter errors now surface in an app-level banner instead
of flashing and vanishing when the canvas fails to mount.
…: gated live-CLI job

- SECURITY.md: state the out-of-box posture explicitly (no permission ceiling by
  default; same-origin app code can drive agents).
- README.md: new Security section; drop the non-existent "token budget cap"
  claim; trim the gated explode/data-binding wording from Wireframing.
- CHANGELOG.md: [Unreleased] section for this round; correct the 0.3.0
  "budget cap" / "confidence attached" claims that never shipped.
- CLAUDE.md + WIREFRAME_APPLY.md: note data-binding/explode are gated this
  release; fix the "confidence attached" claim (WireframeDataBinding carries no
  confidence field). Skill mirrors regenerated via sync:skills.
- ci.yml: secret-gated, schedule/dispatch-only live-cli job that runs the
  apply→write→verify loop against a real CLI.
…counts)

Audited every doc against the codebase and corrected stale/wrong claims:

- docs/api.md: drop the removed POST /wireframe/draft + /draft/revert routes and
  the ANNOTASK_RENDER_IN_PLACE flag; add the real design-session/* +
  wireframe-snapshots/* apply routes; add the session:updated WS event and the
  annotask_get_binding_classification MCP tool; clarify the default permission is
  no ceiling ('bypass'); broaden the wireframe_apply source description.
- docs/architecture.md: remove the deleted budget-cap.ts and draft-edits.ts
  bullets (replace with the real apply/snapshot modules + wall-clock watchdogs);
  drop "Libraries" as a peer Audit sub-section.
- AGENTS.md (a stale copy of CLAUDE.md): remove the wireframe/draft endpoints,
  fix "Modes (4)/--mode-draw" → 3, "63 CSS variables" → 62, add the
  binding-classification MCP row, mark section_request legacy, refresh the
  wireframe_apply row, and add the data-binding/explode gating note.
- README.md: 27 → 28 MCP tools; qualify byte-exact undo as git-project (non-git
  = anchor files only).
- CONTRIBUTING.md: prop_update → component_prop_update (+ text_update,
  wireframe_direction).
- docs/cli.md: document the binding-classify command; docs/distribution.md: list
  the ERROR_FIX/PERF_FIX/WIREFRAME_APPLY playbooks; docs/setup.md: drop stale
  "libraries" audit section.
- SECURITY.md / CHANGELOG.md: the Host gate covers ALL methods (present tense),
  not just GET.
- WIREFRAME_APPLY.md: note data-binding + explode are gated off this release.
  Skill mirrors regenerated via sync:skills.
Audited the workspace and patched the vulnerable dependencies — 47 advisories
(1 critical, 13 high, 24 moderate, 9 low) down to 1 low.

- Bump the one SHIPPED runtime dep: ws ^8.20.0 → ^8.21.0 (the rest of the
  published package's deps were already clean).
- pnpm.overrides force patched versions for the transitive/dev/playground vulns,
  capped to stay in-major so nothing breaks: shell-quote (critical, webpack
  tooling), fast-uri, devalue, undici, react-router(+dom), vite, postcss, svelte,
  js-yaml, qs, uuid, launch-editor, webpack-dev-server, http-proxy-middleware,
  @babel/core; dompurify override raised to >=3.4.9.
- Bump the astro playground 5 → 6 (the only fix for its advisories was a major).

Remaining: 1 low — esbuild 0.27.x in BUILD tooling only (tsup / webpack-terser),
not a runtime dep of the published package. Forcing esbuild >=0.28.1 breaks the
shell build (0.28 dropped transforming destructuring to the old browser targets),
so it's deliberately left until tsup/vite move their pinned esbuild forward.

Build + typecheck + 999 tests green; published-package prod deps are clean.
@kurtstohrer kurtstohrer merged commit febd1fe into main Jun 22, 2026
3 of 5 checks passed
@kurtstohrer kurtstohrer deleted the feat/embedded-agents branch June 22, 2026 22:49
Comment thread src/mcp/tools.ts
clearTimeout(t)
resolve(m)
})
const t = setTimeout(() => { unsubscribe(); resolve(null) }, timeoutMs)
}

function escapeCell(s: string): string {
return s.replace(/\|/g, '\\|').replace(/\n/g, ' ').slice(0, 200)
} catch (err) {
// A non-conflict failure (disk error) won't get better on retry.
if (!(err instanceof WireframeRevConflictError)) {
console.warn(`[Annotask] wireframe ${mode} for ${taskId} failed:`, err)
const EVIDENCE_CLASSIFICATIONS: readonly string[] = ['literal', 'bound', 'loop-literal', 'unknown']
const ANCHOR_POSITIONS: readonly string[] = ['before', 'after', 'append', 'prepend']

export function emptyDesignSessionDocument(sessionId: string): DesignSessionDocument {
function makeSection(heading: string, lines: string[]): GuideSection {
const meta = GUIDE_SECTION_META[heading] ?? FALLBACK_META
// Strip leading/trailing blank lines and HTML comment placeholders.
const raw = lines.join('\n').trim().replace(/^<!--.*?-->$/gm, '').trim()

function ensureSessionIdentity(): void {
if (!sessionId) {
sessionId = `ds-${Date.now()}-${Math.random().toString(36).slice(2, 7)}`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants