v0.4.0 — release hardening for wireframes + embedded agents by kurtstohrer · Pull Request #50 · kurtstohrer/annotask

kurtstohrer · 2026-06-22T22:48:16Z

Cuts v0.4.0. Hardens the two flagship features for release, scopes the wireframe surface down to what's ready, sweeps the docs for accuracy, and clears the dependency advisories.

Merging triggers release.yml → publish.yml (tag + npm publish) and re-scans Dependabot on main (the dashboard's 42 alerts are from these same deps, fixed on this branch).

Fixes — apply/undo lifecycle + embedded agents

Lifecycle no longer wedges. A crash, abort, restart, orphan reconcile, needs_info, or denied used to strand the snapshot batch as running + entries as applying — disabling undo, discard, AND re-apply at once and locking the wireframe canvas at building forever. Every terminal transition (HTTP + server-side orphan/boot sweep) now routes through one shared closure that seals the batch, releases entries, and unlocks the canvas.
Byte-exact undo across the agent's whole footprint (git projects): git stash create baseline + git show auto-extend of every touched file; agent-deleted files are recreated. Non-git keeps anchor-only coverage.
Anchorless block no longer crashes the apply; retry re-verifies; needs_info answers reach the agent on resume; cross-tab 409 is a benign no-op; wireframe.json migration shim guards against silent wipes; dead BudgetCap removed; loud non-loopback startup warning.

Wireframe scope-down

Data binding + explode gated off (src/shell/wireframeFeatures.ts) while the binding shape-confidence ladder is reworked; existing persisted bindings render read-only. Capture/sketch/directions + the implement loop are unchanged.

Docs + security + CI

SECURITY.md states the out-of-box posture (no permission ceiling by default); README gains a Security section.
Docs-accuracy sweep: removed dangling refs to deleted modules/routes, fixed counts (28 MCP tools, 62 CSS vars), re-synced AGENTS.md.
New secret-gated, schedule/dispatch-only live-cli CI job exercising the real apply→write→verify loop.

Dependencies

47 → 1 advisory (1 critical + all 13 high + all 24 moderate cleared). Bumped the one shipped dep (ws); pnpm.overrides for transitive/dev/playground vulns (in-major caps); astro playground 5→6. The 1 remaining low is esbuild in build tooling only (not shipped; forcing the patch breaks the shell build).

Verification

Build ✓ · typecheck ✓ · 999 unit tests ✓ · CI skill-drift gate ✓. New tests cover the git auto-extend, deleted-file recreation, anchorless apply, cross-tab 409, and wireframe migration.

🤖 Generated with Claude Code

Storage and guardrail layer for the embedded chat (ANN-8 / M4). Owner: AICoder. Designed so the DesignEngineer settings-sheet pass can bind to zod-validated shapes without churn. New modules: - src/embedded/provider-config.ts — zod schema and helpers for five providers (Anthropic, OpenAI/Codex, OpenAI-compatible, Copilot, Paperclip), per-conversation USD cap, redaction and event-log toggles, plus structured validation and safe-logging redactor. - src/embedded/redaction.ts — secret-pattern scrubber for prompts and tool results. Catches provider keys, GitHub PATs, AWS credentials, JWTs, PEM blocks, and env-style assignments. Idempotent; returns match offsets for telemetry without leaking redacted bytes. - src/embedded/budget-cap.ts — per-conversation BudgetCap with injectable pricer, sticky shouldStop() once breached, and BudgetCapExceeded exception path. pricerFromRateCard helper matches Anthropic's 10% cache-read / 125% cache-write convention. - src/embedded/event-log.ts — local-only ring buffer for provider turn, tool_call, tool_result, and error events. Bounded capacity, optional persistence sink, subscribe() for UI live updates, totalsByConversation for the composer's token meter. No fetch surface. Shell surface: - src/shell/composables/useProviderSettings.ts — reactive singleton backed by localStorage, exposes makeConversationBudget() and usageForConversation() for the composer cap chip + cost meter. - src/shell/components/ProviderSettingsPanel.vue — functional starter panel. DesignEngineer owns the visual polish; field names map 1:1 to the schema so renames here do not silently break storage. Tests: 49 new tests across redaction, budget-cap, event-log, provider-config, and useProviderSettings. All 116 vitest tests pass. Co-Authored-By: Paperclip <noreply@paperclip.ing>

Mount AISettingsPanel and ProviderSettingsPanel inside a sidebar-tabbed Settings overlay, with Providers placed between AI and Appearance per the M4 sheet UX brief. Visual polish on ProviderSettingsPanel: - Flat horizontal provider pill strip, same affordance as the AI tab's model picker — single screen with branch selection, not a vertical list. - Visual parity with AISettingsPanel via a shared _ai-settings.css module (`.ai-section`, `.ai-input-row`, `.ai-validation`, etc.). AISettingsPanel's scoped style block is removed; both panels now read the same tokens. - Inline per-field validation: errors surface beneath the originating field with a danger-toned input border, replacing the bottom-of-panel list. Validation runs reactively on every edit and on provider switch. - Per-provider empty states: Copilot "Sign in to GitHub Copilot" (disabled until ANN-16 ships device-code OAuth) and Paperclip "Connect Paperclip company" CTA. - Readiness chip reflects `isActiveProviderReady` so users see at a glance whether the active branch can run. - Reveal toggles, focus-visible outlines, and `aria-selected` on pills. Layout: - Sidebar tabs on desktop (≥640px); the strip collapses to a horizontal pill row on mobile. - Lead copy on AI tab points users to Providers for non-Anthropic keys so the relationship between the two panels is discoverable. Carries forward AISettingsPanel.vue + useAIConfig.ts + pricing.ts that ANN-4 left untracked in the workspace; useProviderSettings already imports from pricing.ts, so without these the prior M4 commit didn't build. Verified visually with Playwright at 1440x900 and 390x844: - Each provider tab renders the right branch with inline error states. - Settings persist across reload (`localStorage` key `annotask:ai:providerSettings`). - `pnpm exec vitest run` — 367 pass / 2 skipped. - `vue-tsc --noEmit -p src/shell/tsconfig.json` clean. Co-Authored-By: Paperclip <noreply@paperclip.ing>

…wizard Bring the embedded agent surface online end-to-end: Providers - Add CLI-local provider family (claude-local, codex-local, opencode-local, copilot-local) that reuses the user's existing CLI logins via a spawn endpoint — no API keys stored in the browser. - Add OpenRouter provider and a generic OpenAI-compatible transport. - Remove the GitHub Copilot BYOK/HTTP provider; copilot-local (CLI subprocess) is now the only Copilot path. parseProviderSettings migrates any persisted activeProvider='copilot' to 'copilot-local'. - Provider factory dispatches to the active provider with credentials from the persisted settings blob. - Live model-catalog discovery per provider with TTL cache, dedup, and an explicit "Cannot fetch models" blocked-state for providers with no headless model list. Personas + per-agent overrides - Persona resolver maps task types to personas; overrides live in .annotask/agents.json (provider, model, effort, projectDirections). - Settings → Agents shows AgentDirectionsPanel with per-persona runtime config; switching providers auto-clears stale model ids so the picker always shows a valid catalog or defaults to Auto. Init wizard + project state - Multi-step InitWizard (agent/scan/review) drives first-run setup and re-init. Server-side init pipeline writes .annotask/design-spec.json, components, agent directions, and per-persona templates. - WebSocket init:progress events keep the UI in sync with the running pipeline. In-shell chat - Conversation tab + composer with auto-run, working indicator, and a message/stream tokenizer. - useTaskThread / useEmbeddedAgent composables wire the provider stream into the task lifecycle (transitions, agent_feedback, usage ledger). Token usage + guardrails - .annotask/usage.jsonl ledger aggregated per scope (task/init/apply/chat) and per provider, surfaced in Settings → Agents. - Redaction toggle + local-only event log; budget cap enforced per turn. UI cleanup - Tabbed SettingsOverlay (Agents / Project / Appearance). - ProviderSettingsPanel: flat pill switcher, live model picker with blocked-state warning, local-CLI detect banner, billing note for claude-local on Pro/Max plans. - Drop AISettingsPanel and useAIConfig (superseded by ProviderSettingsPanel + useProviderSettings).

Marketing landing page as an AI would first-generate it — correct structure and real content, but with slightly off copy, unnecessary sections (pricing, trusted-by, newsletter), weak styling (purple accent, 4px border-radius, no light mode, no responsive breakpoints), a11y gaps (low contrast, empty alt, missing main landmark, heading skip), and no syntax highlighting in code blocks. This is the "before" snapshot for the demo video.

Fixes tagline ("applies the change"), CTA ("View on GitHub"), adds <main> landmark, restores light mode, responsive breakpoints, 12px card radius, blue accent (#3b82f6), syntax highlighting, styled lifecycle badges, dogfood banner. Removes pricing/trusted-by/newsletter sections. This is the "after" snapshot for the demo video.

- scripts/demo-reset.sh: restores marketing page to AI first-draft state from the demo/marketing-before tag, clears tasks and sidecars - scripts/demo-restore.sh: restores polished state from HEAD - demo/transcript.md: 12-segment voiceover + action cues for ~4:25 hero video - package.json: adds demo:reset and demo:restore npm scripts - .gitignore: excludes demo recording artifacts (segments, voiceover, final)

- e2e/screenshots.test.ts: 12 Playwright tests capturing annotask shell features (pins, arrows, sections, highlights, tokens, inspector, etc.) - e2e/demo-record.test.ts + config: automated video recording of demo segments (before scroll, setup, style editor, a11y scan, tokens, result) - docs/media/screenshots/: 11 generated feature screenshots - playgrounds/simple/marketing/public/screenshots/: copies for feature cards - demo/segments/: recorded webm+mp4 clips (gitignored) - demo/final/annotask-demo.mp4: assembled 74s demo video (gitignored)

…rd recording - Screenshots now captured against marketing playground (port 5181) instead of vue-vite playground — feature cards show marketing page content in shell - Hero screenshot (shell-overview.png) shows marketing page in dark mode with correct "Agent applies the change" tagline - Init wizard recording now properly clicks through agent selection → scan, captures framework detection checkmarks and agent output - playwright.config.ts: screenshots project points to port 5181 - e2e/screenshots.test.ts: saves to both docs/media and marketing public dirs

Screenshots now show the annotask shell over the Solar System Explorer (vue-vite playground) instead of the marketing page. The planet orbital canvas and card grid are more compelling showcase content. Shell overview shows the /planets route with the 8-planet card grid.

Prompt file for a new session to record the init wizard deep-dive using testreel (programmatic recordPage API for iframe support) and edge-tts (AndrewNeural voice). Includes 10-segment outline covering all wizard steps: agent selection, scan progress, and 6 review sub-steps (framework, tokens, components, APIs, style guide, agent directions).

- Agent selection: cycle through Codex → OpenCode → Claude to show options - Scan: record video first, write voiceover after to match what's on screen - Style guide: scroll through content, show edit mode, mention loading existing docs from repo - Agent directions: go deeper — show per-persona cards, change one agent's provider to a different LLM, explain task-type routing - Save: don't say "shell", say "your project is ready for Annotask" - All voiceover drafts marked as starting points to rewrite after recording - Workflow section: video first → watch → rewrite scripts → generate audio

…ness, and live test coverage Hardening + verification on top of the embedded-agents feature (run local AI CLIs — claude, codex, opencode, copilot — inside Annotask to apply design tasks). Security defaults & enforcement: - Default permission mode is now Auto (per-CLI least-permissive headless mode via normalizeHeadlessMode) instead of blanket bypass: codex stays --full-auto sandboxed, copilot minimal, claude/opencode escalate to bypass. bypass is now an explicit opt-in; removed the default->bypass migration. - Server-side ANNOTASK_MAX_PERMISSION clamp re-derives the level from argv and refuses over-cap spawns (per-task and init paths). - Wire redactValue into tool_use inputs; add Google/Stripe/connection-string redaction patterns. Honest per-CLI plan-mode labels. Robustness: - agent-spawn: stdin EPIPE crash guard, concurrency cap, and PWD=cwd so CLIs that read $PWD (opencode) resolve the spawn workspace, not the launch dir. - init: codex --skip-git-repo-check so it runs in non-git projects; extracted buildInitCliInvocation for testability. Surface usage-ledger write failures. SSE reconnect give-up + retry. Quote-aware extra-args tokenizer. Copilot input-token forward-compat. De-dup NATIVE_PLAN_PROVIDERS. Tests: - Hermetic: permission-flag matrix, provider apply-arg + clamp matrix, redaction wiring, abort propagation, SSE reconnect, init arg builder. - Live (ANNOTASK_LIVE_CLI=1): new apply-cli-matrix e2e — all four CLIs apply a real style_update to the marketing playground and advance the task to review.

Phase 3 architecture work: move run coordination + task finalization to the server, and fix the conversation-store flush cost. task-thread O(n^2) -> O(1) flush: - The store now mirrors each task's thread in memory (it is the sole writer; external agents only read/tail). Streaming a partial turn no longer re-reads, re-parses and rewrites the whole JSONL on every throttled flush — the common 'update the last (streaming) line' case does a tail-rewrite; earlier-message edits fall back to an atomic full rewrite. Server-side run registry keyed by task id (cross-tab dedup): - Thread taskId from useEmbeddedAgent -> StreamOptions -> the spawn POST body. - agent-spawn keeps a byTask map and refuses a second spawn for a task already running (409), closing the cross-tab double-spawn hole the per-tab client guard can't (two tabs auto-running one task would fork two CLIs on the same files). Added a spawnImpl test seam. Orphaned-task finalization: - The registry reports every run end via onRunEnd; index.ts grace-checks (~12s) and, if the task is still in_progress (client never transitioned it, e.g. the tab closed mid-run), marks it blocked instead of leaving it stuck. Race-safe: a normal completion's client review PATCH lands within the grace. Tests: task-thread streaming durability + non-last fallback; agent-spawn dedup/orphan-hook/taskId validation via a fake child; live apply-matrix now threads taskId so the registry path is exercised end to end.

…rounding, wireframe lifecycle, docs sweep Acts on a full architecture/product/security/agent review. Highlights: Reproduced breakers - transform: Svelte attribute-expression corruption (brace tracking) and JSX/TSX statement-scope string/comment/regex corruption (scope stack). - webpack: devServer proxy was dead under webpack-dev-server v5; inject synchronously in apply() with v4/v5 forms + loud manual-config fallback. Embedded agents — all four CLIs first-class - codex/opencode/copilot now carry multi-turn history (rollupHistoryAsPrompt, oldest-first truncation under argv budget); claude keeps its stdin path. - copilot included in every seed-gate list/message. - seed-run error exits revert the task to pending + clear the run indicator (no more stuck in_progress); process-group kill so grandchildren can't orphan. Security - Host gating on all /__annotask routes (DNS-rebinding fix) with allowedHosts / bind-host / ANNOTASK_ALLOWED_HOSTS escape hatches. - permission-cap synonym holes closed (bypassPermissions / danger-full-access / -a never); bridge postMessage origin validation (no app data posts to '*'); draft-edits path containment; .annotask/ auto-gitignored; tasks.json quarantine on corrupt boot; transcript cleanup on accept/delete; transition TOCTOU fix. Shell silent-failure pattern - auto-run busy-loop hang fixed; res.ok checks + lastError; error-stack URL→repo path normalization (error_fix creation works again); pending draft survives failures; WS reconnect resyncs tasks; tab-order overlay rate-capped. Agent grounding - new MCP tools: get_source_excerpt, get_playbook, get_agent_directions (+CLI); get_tasks detail strips 200KB sidecars + paginates; structured --mcp errors; CLI runCli() test seam. Wireframe Milestone 1 — the loop closes - instance status/taskId/previewProps; idempotent Build; accepting a wireframe_apply task removes its placements server-side (delete reverts) — no duplicate renders during/after review; reapply mounts previewProps, isolates per-instance failures, surfaces stale anchors; placements panel with delete; rev-based 409 conflict on PUT; drop pipeline extracted to useWireframeCanvas.ts. Docs/contract - api_update purged (13 locations); wireframe_apply documented everywhere; missing routes/WS events/MCP tools added; PERF_FIX enums corrected; embedded layer added to architecture; SECURITY threat model; TODO.md removed (done); scripts/sync-skills.mjs + CI drift check keep skill mirrors in sync. Verification: typecheck clean; 775 unit tests pass (+184); build clean; live streaming + apply matrix pass 4/4 (claude, codex, opencode, copilot).

…n tool Self-contained kickoff for the next phase: end-user live edits (props/text/ styles/move/resize/structure) + instant faithful previews that persist to real source, with undo/discard. Captures M1-done state, the live-commit vs preview-only architecture decision (recommends live-commit via the draft engine), and milestones M2-M7 with acceptance criteria.

…-apply loop The server half of the snapshot-wireframe apply loop. A design session is an ordered journal of user edits (.annotask/design-session.json, CAS on rev); 'Apply now' snapshots every touched file (byte-exact undo/discard net), mints ONE wireframe_apply task carrying context.wireframe + context.session, and stamps statuses. When the agent flips the task to review, a verification pass re-reads source per entry and marks each written|failed. Accept commits the snapshots; delete releases entries back to pending. The file-snapshot engine replaces draft-edits (one-time journal migration included), is hash-guarded against user edits, and rehydrates across restarts — files revert only via explicit undo/discard. binding-classify stays as agent tooling (MCP + CLI): round-trip honesty classification of props/text before an agent rewrites them.

…placement identity The shell half of the apply loop. useStyleEditor becomes a facade over the useDesignSession journal (changes is a collapsed projection; the frozen report/watch contract is unchanged); the DesignSessionPanel drives Apply now / Undo last apply / Discard with per-entry status chips. Placements gain durable identity end-to-end: clicking a mounted placement selects the placement (instance_id), not its mounted internals, and the inspector declines style edits on placements (no source anchor to snapshot or verify). The project palette pins the Project library first, with sample-prop preview widgets extracted to propWidgets. The design-apply e2e seeds the journal via the API and proves the panel -> task -> snapshot -> release loop; the react-vite /planets playground page backs capture/canvas e2e to come.

CLAUDE.md and the apply skill describe the journal-backed apply loop; component_prop_update/text_update become documented legacy types (agents still apply them from older tasks, the UI no longer emits them). The todo handoff records the pivot: strip the live-edit surface, build snapshot wireframing on the kept apply loop. Skill mirrors synced.

Wireframe mode freezes the current route into a manipulable image canvas. A new wireframe:capture bridge message walks the rendered DOM (semantic main + single-child unwrap; header/nav/footer/aside chrome pass; 24-block cap), rasterizes each block sequentially with html2canvas (progress pushes, per-block failure stays an honest hatched box), and ends with a scale-1 full-document pass — the 'before' truth for apply-time composites. Block PNGs land in .annotask/wireframe-snapshots/ (id-addressed upload, contained serve/delete); wireframe.json gains a per-route canvas node (blocks carry source anchors + original rects; validated to block depth at the PUT boundary). The canvas renders as an opaque overlay INSIDE #canvas-area — the live iframe stays mounted underneath, so exit is lossless by construction (e2e proves zero reloads and a PlanetsPage.vue:line anchor chip on /planets).

W2 — the sketch becomes freely manipulable: drag (3px threshold, z-bump on grab), 8-handle resize, soft-delete for captured blocks (the apply diff needs deletion as a fact; an undelete popover restores them), hard delete for sketch material with refcount-guarded snapshot cleanup, Ctrl+D duplicate (shares the image file; duplicateOf roots at the original), per-block notes (the user-said channel), palette drops as honest preview:component snapshots (canvas-only — no live mount; a data-bound component degrades visibly to a placeholder render), and a drawn labeled placeholder tool. Mutations persist with a trailing debounce; F5 restores the canvas as left. Two real bugs fixed en route: palette-drop props were Vue reactive proxies postMessage refused (DataCloneError — every drop silently degraded to placeholder), and block discovery unwraps single-child wrappers by HEIGHT, not area, so a centered max-width page column still yields per-section blocks at any viewport. Captured blocks carry their first class name — the only human-distinct label when every block shares one owning component. W3 — 'Implement this wireframe' closes the loop: a pure diff engine turns the sketch-vs-capture delta into ONE anchored direction per changed block (move/resize/delete/add/note) where relational facts computed from box geometry ('now above the filters (was below)') are the contract and pixels are explicitly hints; tool-measured geometry and the user's verbatim notes ride separate schema channels. Adds anchor at the nearest surviving captured neighbor. A labeled before/after composite (numbered badges matching the direction list) rides the existing screenshot pipeline. Directions are journal entries, so the apply loop is unchanged: route-scoped collection, file snapshots before the agent runs, ONE wireframe_apply task, byte-exact undo. The verifier trusts directions with no source read — spatial outcomes verify visually, so resolutions must be specific (playbook rewritten). The canvas locks to the task while building; delete unlocks the sketch for tweak-and-re-implement; accept removes it and shows the real implementation. Proven live: the apply-session matrix seeds the /planets rearrangement (grid above filters + pagination placeholder + note) and a real claude CLI implemented it in source, trust-verify flipped all directions written, and revertApplyBatch restored the bytes exactly.

…created-file netting Canvas: double-click (or the header button) EXPLODES a captured block one level deeper — the durable anchor re-resolves the live element, a rootEid capture rasterizes its children (no redundant full-page pass), and the children land translated by the block's canvas delta with their own file:line anchors and live-position diff baselines. Dragging snaps edges/centers to neighbors with visible alignment guides (Alt disables; pure computeSnap helper). Marquee + shift-click multi-select with group drag, group delete/duplicate, and arrow-key nudging (Shift = 10px). The chrome bar shows the capture viewport — pick a device preset before capturing to wireframe mobile. Captured-block labels stay distinct via cssClass, and the two vue wireframe e2e suites now live in ONE file: Playwright runs separate spec files in parallel workers, and they share one dev server's wireframe document. Server: boot-time GC sweeps .annotask/wireframe-snapshots/ for PNGs no doc references (1h mtime guard; a malformed wireframe.json aborts the sweep — an unreadable doc must never read as 'no references'). In git projects, apply batches record the untracked set at snapshot time and the seal nets agent-CREATED files into the batch (sha256 per file) — undo/discard delete pristine copies and keep+report user-edited ones; non-git projects degrade silently.

…drop Double-clicking a block visually 'removed the styling': explode deleted the parent block, and the children only carry their own pixels — the container's background, padding, and the surface BETWEEN children (all painted by the parent element) vanished, leaving naked fields floating on the canvas grid. The parent now survives as a SHELL backdrop: the rootEid capture takes one extra pass of the root's rect with the captured child blocks visibility-hidden in the html2canvas clone — the container's own pixels with no ghost children, so dragging a child out reveals clean surface, never a burned-in copy of itself. The shell uploads under a new file id (duplicates keep referencing the original pixels), is marked shell:true (not explodable again; no explode affordance), and stacks below the children. When no shell can be captured the parent is removed as before — a stale full image would ghost. Playbook notes that container vs child directions mean whole-container moves vs restructuring inside it.

Wireframe mode is the only move-things surface now. Strips the tool across shell (useRepositionMode deleted; useWireframeCanvas pointer handlers, App.vue shield, toolbar buttons, 'reposition' interaction mode + 'r' key), bridge (resolve:move-source + move:element handlers, ResolveMoveSourceResult/ MoveElementPayload types), journal (recordMove/MoveChangeRecord; the component_move report arm), and theme (--mode-reposition: 64 -> 63 vars, making CLAUDE.md's documented 63-var contract true; saved custom themes tolerate the stale key by construction). Kept shared dependencies the tool only rode on: dropPositionFor + the drop indicator (live palette drop), data-annotask-instance stamping (selection, drop-target refusal, capture exclusion, reapply idempotency), and the containerEids map — hoisted into useWireframeCanvas because deletePlacement unmounts live nodes through it and reapply repopulates it after F5/route change. ComponentMoveChange/component_move stays in the schema + SKILL.md as a documented legacy type (now with its previously-missing apply rule), no longer emitted.

The offscreen preview container and the html2canvas backgroundColor were hardcoded to a white card (#ffffff/#111111), which stuck out badly on dark apps. resolveAppSurface() reads the computed background the app actually paints (body first — content sits on it — then html; the opposite order of detectColorScheme's viewport probe, which is unchanged) and derives a contrast-safe text color from its BT.709 luminance, falling back to a scheme-derived pair when both are transparent. readBg() is hoisted out of detectColorScheme so both share it. Fixes every existing consumer for free: wireframe palette drops and the Components page thumbnails. Verified on the dark vue-vite playground — a dropped Header snapshot now reads rgb(18,18,18) at every padding corner (the app's #121212), confirmed by eye on the PNG.

…icker WireframeDataBinding (shared/wireframe-types.ts): REAL scanner-catalog identity (kind/name/module/endpoint) + user drill-down (path like "planets[]", fields) + the shape_source honesty tag ('api-schema' | 'source-details' | 'none'). WireframeBlock gains an optional kind-agnostic data field; isWireframeBlock rejects malformed bindings at the PUT boundary. GET /__annotask/api/data-source-shape?name&kind&file resolves a catalog entry down the honesty ladder (data-source-shape.ts): (1) the entry's endpoint matched against discovered API schemas via resolveEndpoint — the already-deref'd response_schema walks into a DataShapeNode tree (schemaToShape: cycle and GraphQL $type degrade to named ref leaves, allOf merges, oneOf/anyOf first object-ish variant, depth-capped); (2) regex-inferred return-type hints verbatim with their confidence — never expanded into a tree; (3) honestly nothing. Runtime-promoted entries are excluded (no code identity, no shapes this round). Shell-only — agents re-ground via the existing data-source-details / api-operation tools. DataBindingPicker.vue (ships unmounted until D4/D5): catalog search → source select → expandable shape tree with path pick + field checkboxes under 'api-schema'; verbatim hints + free-text under 'source-details'; visibly-blind free-text under 'none'. Pure logic in utils/bindingShape.ts (flattenShape/fieldCandidates/buildBinding, validated before emit). Tests: shared validator depth (wireframe-types.test.ts), walker + ladder against an OpenAPI fixture incl. cycle refs and ambiguity (offline via explicit apiSchemaFiles), flatten/candidates/build units.

…tte drop Dropping or clicking a component in wireframe mode now opens a right-docked Generate panel instead of instantly minting a block: settings (real props vs preview samples via the propWidgets widget table), an optional data binding through the D3 picker (offered automatically when the component looks data-driven, never required), generate (honest preview:component snapshot on the app-true surface, fidelity pill, one-click regenerate), then place — at the remembered drop point (the drag fast path) or by ghost-placing on the canvas (image rides the cursor; Escape backs out; clicks over existing blocks place instead of dragging). A gear button on placed palette blocks reopens the panel seeded from the block for in-place reconfiguration (props/binding/snapshot update under the same block id); bound blocks show a data chip. useWireframeMode grows addPaletteBlock/updatePaletteBlock (the old drop path now routes through them, behavior-identical); useComponentGenerator owns the session state machine with JSON-round-trip clone discipline at every postMessage/persist boundary. html/layout-preset drops keep the instant placeholder path; the live-iframe drop path is untouched. e2e: W2's drop section drives the panel flow; a new spec binds usePlanets through the real api-schema shape tree (planets[] -> name, type), survives F5, and regenerates in place. Unit: generator state machine incl. clone-at-boundary and edit-mode routing.

…tate draw tool removed The wireframe placeholder grows into the section concept (one concept, not two): a drawn box keeps its cheap label-only flow, and gains an on-demand markdown spec (Write/Preview popover via safeMd; Ctrl+Enter or Save commits, plain Enter stays a newline) and a data binding through the D3 picker. "Section" is a DERIVED affordance — a placeholder with md or data renders the section tag, the clipped md hint, and the binding chip; the validator gains the additive md check. setBlockMd/setBlockData persist per-route in wireframe.json, so sections survive F5 (proven: 12/12 browser smoke incl. the api-schema drill to planets[] name+type). The Annotate tab's drawn-section tool is stripped per the live-edit/D1 pattern: DrawnSectionOverlay deleted; DrawnSection state, the draw-rect plumbing, onSectionSubmit (the ONLY section_request emitter), the restore/eid-resolution/rect-tracking section branches, the 'draw' interaction mode + 'd' key (stored value degrades to 'interact'), the toolbar button, and the --mode-draw theme variable (62-var contract; saved custom themes tolerate the stale key by construction). Legacy pending section_request tasks stay visible and agent-applicable in the Tasks panel — with the tool gone there is no overlay claiming editability. section_request stays in TASK_TYPES + SKILL.md as a documented legacy type, now explicitly marked.

…live proof WireframeDirectionChange.added gains md (the user's VERBATIM markdown spec for a drawn section — never blended with measured geometry; the description quotes only its first line) and data (the WireframeDataBinding, cloned plain). computeWireframeDirections emits both for placeholder AND palette adds with an honest description summary ('bind to the composable usePlanets → planets[] (show name, type) [shape: api-schema]'); bare-placeholder output is regression-locked byte-stable. The before/after composite renders the section tag, md first line, and binding identity as text in the dashed box — never fabricated pixels (verified by eye on the minted task's PNG). WIREFRAME_APPLY.md: the add/component and add/placeholder rules split bare vs with-md sections, and a new 'Data bindings on adds' section spells out the shape_source honesty ladder and the re-grounding protocol (annotask_get_data_source_details/_examples always; _api_operation when a schema ref exists; needs_info on contradiction; never invent fields). entryForTask passes the change whole — verified by a new apply-session assertion; directions stay trust-verified. Live proof (ANNOTASK_LIVE_CLI=1, claude): a third matrix scenario seeds a fixture project with a REAL usePlanets composable + PlanetCard and an empty content section (zero planet names anywhere), hands the agent the bound add direction, and asserts the real import + call + v-for + :name/:type landed, NO sample data was fabricated, the composable and card stayed byte-identical, trust-verify flipped written, and undo restored byte-exact source — all 3 live scenarios pass. e2e: a section with md + binding implements into ONE task carrying both verbatim (6/6 wireframe specs green).

…dversarial pass Behavior: - Canvas keydown gains an input-target guard: typing in the generate panel (or its embedded picker) no longer triggers block hotkeys — Backspace in a prop field was deleting the selected block. mdEditing resets on every selection change (select() + the marquee path); it previously stuck after the editor unmounted, swallowing all canvas keys or committing the draft to the wrong block. - DataBindingPicker: monotonic pick token — a slow cold-scan response can no longer land on a newer selection. Root-object shapes: pickedPath starts null (the root's path IS ''), so nothing renders falsely picked and the whole-response pick now reaches the fields multi-select. - Generator: placeAt/apply close the session synchronously before the upload await (double-click placed duplicate blocks); generate/place/apply respect the building lock and Implement cancels any open session; an edit-mode Apply without regenerating keeps the block's existing previewProps (they are 'what the user saw rendered', and nothing was rendered). - Regenerate-in-place uploads under a FRESH filename and refcount-deletes the old one: duplicates share the old PNG (overwriting silently changed them) and the 1h snapshot cache masked same-name re-uploads after reload. imageSrc is strictly own-id keyed; duplicates seed their own liveImages entry at duplication time. - schemaToShape composes allOf with an explicit object type / sibling properties instead of swallowing the inherited keys (unit-pinned). - Directions: a drawn section's description no longer says 'keep it visibly a placeholder' (added.md is the contract); a duplicate whose original left the sketch says so honestly instead of claiming a user-sketched box. - applyDesignSession numbers ONLY direction entries (badge order can't drift when placements/legacy edits ride the same task) and claims the composite screenshot only when one actually rode along. Strip leftovers and docs: - stress-test annotate matrix no longer drives the removed tool-draw (it failed serially for all 7 apps); marketing page drops the Drawn-sections card + orphan screenshot; HelpOverlay/README say 62 CSS variables; SKILL.md intro and docs/api.md mark sections/component_move legacy; vue-webpack's installed skill copies refreshed; dead 'move' icon removed. Not fixed (noted): scanApiSchemas' 60s cache ignores its options (pre-existing, affects rung-1 freshness after an optionless scan warms it); the design-session PUT validates only the change envelope, so a direct PUT could inject an unvalidated added.data (every shell writer validates; agents re-ground anyway).

Block discovery had exactly two passes — semantic chrome (header/nav/ footer/aside) and children of the content root (which starts at <main>) — so content mounted as a SIBLING of main that isn't semantic chrome (a floating button, banner, or toast in layout flow) appeared in the honest full-page 'before' image but got no block on the canvas: it couldn't be moved, deleted, or annotated in the sketch. A third straggler pass walks the capture-root → content-root path and blocks every qualifying sibling with a real footprint at each level, unless the chrome pass already covers it (containing a chrome block counts as covered — blocking the wrapper would double-capture the header inside). The playground's bare <Button> after <main> (App.vue) is the canonical case, asserted in the W1 e2e (by anchor tag — anchor.file alone would false-pass, the header chrome block also anchors to App.vue via attribute fallthrough).

The D1-D6 plan this branch implemented — kept as the round's record, like the snapshot-wireframe handoff before it.

…on removed Applied end-to-end through the wireframe loop (sketch -> directions -> embedded agent -> review -> accept): filters, sort, and the search field regroup into an <aside class="controls"> beside the planet layout (column-reverse under 900px), and the bare sibling-of-main <Button> stub in App.vue is gone — the element the straggler-capture fix was proven on.

…tore the straggler fixture The accepted /planets redesign collapsed the page to two content blocks (page-header + planets-content wrapper), which broke the suite's old header/toolbar/grid anatomy assumptions — W4's marquee aliased toolbarBlk to headerBlk and could no longer select two blocks. The block picks and expectations now target the two-block anatomy (explode covers the wrapper's layout/controls children). The deleted Button stub was also the suite's only sibling-of-main element; a deliberate, styled 'Back to top' button after <main> replaces it as the canonical straggler-pass fixture (commented as such in App.vue), keeping W1's capture-coverage assertion honest. 6/6 wireframe specs green against the new page.

…frame hero demo - src/api.js: fetchStats() + fetchChangelog() against the shared FastAPI (/api/marketing/*); fetchChangelog is deliberately unused — it is the binding target for the hero's drawn 'What's new' section - hero stats strip renders live installs/stars/contributors/frameworks - vite proxy for /api and /openapi.json so the schema scanner resolves shapes (shape_source: 'api-schema', ChangelogEntry[] at confidence 1) - justfile: 'just marketing' target with ensure-api

…ession state, keep init demo:reset:hero restores the marketing playground to the wireframe-hero 'before' state: tracked sources back to HEAD, agent-created strays cleaned, wireframe/design-session/file-snapshots/conversations/usage cleared, tasks emptied — while server.json/design-spec.json/agents.json survive so the InitWizard never hijacks a take. Verified idempotent.

Supersedes the never-recorded 'AI draft → polished page' hero with 'Freeze. Sketch. Real.': freeze the marketing page into the snapshot-wireframe canvas, rearrange it, bind the drawn 'What's new' section to the real /api/marketing/changelog schema, and let the embedded agent rewrite the source — byte-exact undo as the safety net. - demo_plan.md: suite design (10-segment hero, four directions, clips C1–C8, salvage map, VO tone) - demo/transcript-hero.md: per-segment voiceover + on-screen directions - demo/HERO_RECORDING_PROMPT.md: recording-session runbook — all selectors verified against src/shell + the e2e suite; provider seeding, wait chains, speed-ramp, pre-flight, risk register - MARKETING_DEMO_PROMPT.md: superseded banner (its stage survives in C3/C8)

…dev server A bare-array tasks.json (hand-written resets, older tooling) parsed fine but crashed the first addTask() with 'Cannot read properties of undefined' — taking the whole Vite dev server down. Normalize arrays and missing .tasks keys at load. demo-reset-hero.sh now writes the canonical shape.

…ripted and proven One continuous scripted take on the marketing playground: wireframe freeze, drag/resize/note, drawn 'What's new' section bound to the real changelog schema, Implement → live claude-local run (ramped 8× with caption) → safety affordance → accept → reveal. Produces demo/final/annotask-hero.mp4 (140s; media dirs are gitignored). - demo/lib/record-helpers.mjs: live DOM cursor overlay (testreel's post cursor can't track real-mouse drags — its standalone moveCursorToPoint feeds a global tracker recordPage never serializes), boot/enter helpers ported from the e2e suite, API poll gates - demo/record-hero.mjs: the take script; DEMO_SMOKE=1 validates every selector/gate headless in ~13s without an agent run - demo/generate-voiceover-hero.sh + demo/assemble-hero.sh: edge-tts VO, marker-driven assembly with input-side-seek speed ramp and collision-resolved narration placement - transcript/runbook updated: safety beat moved BEFORE accept (accepting clears the session and its snapshot batches — undo only exists in review)

…ning Land the place-first on-canvas component configuration and harden the wireframe apply lifecycle ahead of v0.3.0. Place-first config (replaces the generate-then-place panel): - Dropping a palette component mints the block instantly; props, data binding, and loop repetition are configured in an inline popover with debounced live regeneration on the app-true surface. - New: WireframeBlockPopover, PropWidgetRows, useCanvasHistory, useCanvasSelection (+ tests); remove GenerateComponentPanel. - Rewrite the vue-vite wireframe e2e to drive the new wf-pop-* flow. Apply-lifecycle data safety (review blockers + crash recovery): - Gate Undo/Discard while an apply run is in flight; the snapshot engine refuses a still-'running' (unsealed) batch and the discard endpoint 409s on one, so a one-click revert can no longer clobber the agent's in-progress bytes. - (Re-)seal the batch by task on every review (sealBatchByTask), so re-apply-after-deny and pure-placement applies get a correct undo baseline instead of falsely reporting "edited outside Annotask". - A crashed/aborted seed run seals its batch 'failed' (keeping it revertible) and returns its entries to 'pending' instead of stranding them in 'applying'. Embedded-agent + shell fixes: - Render model/user markdown through safeMd (DOMPurify) instead of raw marked.parse + v-html (XSS hardening). - Stop the cumulative token counter double-counting across turns. - plain()-clone instanceProps before postMessage so a nested-object binding no longer degrades the live preview to a placeholder (DataCloneError). Re-sync the .claude / .agents / vue-webpack skill mirrors from canonical.

Bump to 0.3.0 — embedded-agent mode and wireframing are additive, non-breaking features, so a semver minor. - Restructure CHANGELOG with the real 0.3.0 surface (snapshot wireframe, data-binding picker, design-session apply loop, embedded agents, the tool-strip pivot) and the data-safety fixes from the release review. - Document embedded agents + wireframing in the README (it ships in the npm tarball; CLAUDE.md alone did not reach users).

These are internal handoff/recording material, not product — and they would otherwise ride into the public repo and the PR to main. The npm tarball (files: ["dist","skills"]) never shipped them; this removes them from git too. - todo/: the 5 wireframe/embedded-agent handoff + kickoff planning docs. - demo/: recording prompts, transcripts, and the record/assemble/voiceover scripts. The gitignored demo/{segments,voiceover,final} binaries are local-only and untouched. - demo_plan.md: the root demo plan doc. Kept: scripts/demo-*.sh + the demo:* package scripts (functional tooling that resets the marketing playground off the demo/marketing-before git tag, which the live apply-cli-matrix test also uses). Dangling doc-comment pointers in e2e/helpers/design-tool.ts and scripts/demo-reset-hero.sh updated.

…bedded-agent loop Apply/undo lifecycle no longer wedges. A crash, abort, restart, orphan reconcile, or an agent that pauses at needs_info / is denied used to strand the snapshot batch as 'running' and entries as 'applying' — disabling undo, discard AND re-apply at once and locking the wireframe canvas at 'building' forever. Every terminal transition (HTTP + the server-side orphan/boot sweep) now routes through one shared closure (releaseApplyTask: seal batch + release entries + unlock canvas); applyInFlight keys off the always-sealed batch status; retries re-stamp entries so review re-verifies. Undo is byte-exact across the agent's whole footprint. In a git project the engine captures a pre-apply baseline (git stash create) and, at seal, folds every tracked file the agent actually touched into the batch (pre-apply bytes from git show). Agent-deleted files are recreated on undo (they used to vanish despite a held copy). Non-git projects keep anchor-only coverage. Other fixes: - Anchorless wireframe block (file '') no longer crashes the apply after minting the task — empty anchors are filtered from the snapshot set. - needs_info answers are injected into the agent's resume prompt (no-MCP loop). - Cross-tab 409 (task_already_running) is a benign no-op, not a stream error that reverts the winning tab's live run. - wireframe.json runs through a migration shim before validation, so a future schema bump (or a legacy version-less doc) upgrades instead of silently wiping. - Removed the dead BudgetCap module (USD caps aren't portable across providers; the idle/duration watchdogs are the runaway guard). - The dev server prints a loud warning when bound beyond loopback (vite --host). Tests added for the git auto-extend, deleted-file recreation, anchorless apply, and the cross-tab 409 path.

…surface This release ships the reduced wireframe surface — capture + sketch + the implement-this-wireframe directions loop — and defers two built-but-not-ready areas behind flags in src/shell/wireframeFeatures.ts: - Data binding (palette/section binding picker, shape drill-down, prop→field map, loop repeat) — deferred while the shape-confidence honesty ladder is reworked. - Explode-to-children. The code paths stay in the tree and existing persisted bindings still render read-only; only the UI entry points are gated (WireframeCanvas, BlockPopover). The e2e specs that exercise binding/explode are skipped/trimmed to match. Also: wireframe capture/enter errors now surface in an app-level banner instead of flashing and vanishing when the canvas fails to mount.

…: gated live-CLI job - SECURITY.md: state the out-of-box posture explicitly (no permission ceiling by default; same-origin app code can drive agents). - README.md: new Security section; drop the non-existent "token budget cap" claim; trim the gated explode/data-binding wording from Wireframing. - CHANGELOG.md: [Unreleased] section for this round; correct the 0.3.0 "budget cap" / "confidence attached" claims that never shipped. - CLAUDE.md + WIREFRAME_APPLY.md: note data-binding/explode are gated this release; fix the "confidence attached" claim (WireframeDataBinding carries no confidence field). Skill mirrors regenerated via sync:skills. - ci.yml: secret-gated, schedule/dispatch-only live-cli job that runs the apply→write→verify loop against a real CLI.

…counts) Audited every doc against the codebase and corrected stale/wrong claims: - docs/api.md: drop the removed POST /wireframe/draft + /draft/revert routes and the ANNOTASK_RENDER_IN_PLACE flag; add the real design-session/* + wireframe-snapshots/* apply routes; add the session:updated WS event and the annotask_get_binding_classification MCP tool; clarify the default permission is no ceiling ('bypass'); broaden the wireframe_apply source description. - docs/architecture.md: remove the deleted budget-cap.ts and draft-edits.ts bullets (replace with the real apply/snapshot modules + wall-clock watchdogs); drop "Libraries" as a peer Audit sub-section. - AGENTS.md (a stale copy of CLAUDE.md): remove the wireframe/draft endpoints, fix "Modes (4)/--mode-draw" → 3, "63 CSS variables" → 62, add the binding-classification MCP row, mark section_request legacy, refresh the wireframe_apply row, and add the data-binding/explode gating note. - README.md: 27 → 28 MCP tools; qualify byte-exact undo as git-project (non-git = anchor files only). - CONTRIBUTING.md: prop_update → component_prop_update (+ text_update, wireframe_direction). - docs/cli.md: document the binding-classify command; docs/distribution.md: list the ERROR_FIX/PERF_FIX/WIREFRAME_APPLY playbooks; docs/setup.md: drop stale "libraries" audit section. - SECURITY.md / CHANGELOG.md: the Host gate covers ALL methods (present tense), not just GET. - WIREFRAME_APPLY.md: note data-binding + explode are gated off this release. Skill mirrors regenerated via sync:skills.

Audited the workspace and patched the vulnerable dependencies — 47 advisories (1 critical, 13 high, 24 moderate, 9 low) down to 1 low. - Bump the one SHIPPED runtime dep: ws ^8.20.0 → ^8.21.0 (the rest of the published package's deps were already clean). - pnpm.overrides force patched versions for the transitive/dev/playground vulns, capped to stay in-major so nothing breaks: shell-quote (critical, webpack tooling), fast-uri, devalue, undici, react-router(+dom), vite, postcss, svelte, js-yaml, qs, uuid, launch-editor, webpack-dev-server, http-proxy-middleware, @babel/core; dompurify override raised to >=3.4.9. - Bump the astro playground 5 → 6 (the only fix for its advisories was a major). Remaining: 1 low — esbuild 0.27.x in BUILD tooling only (tsup / webpack-terser), not a runtime dep of the published package. Forcing esbuild >=0.28.1 breaks the shell build (0.28 dropped transforming destructuring to the old browser targets), so it's deliberately left until tsup/vite move their pinned esbuild forward. Build + typecheck + 999 tests green; published-package prod deps are clean.

+          clearTimeout(t)
+          resolve(m)
+        })
+        const t = setTimeout(() => { unsubscribe(); resolve(null) }, timeoutMs)


+}
+
+function escapeCell(s: string): string {
+  return s.replace(/\|/g, '\\|').replace(/\n/g, ' ').slice(0, 200)


+    } catch (err) {
+      // A non-conflict failure (disk error) won't get better on retry.
+      if (!(err instanceof WireframeRevConflictError)) {
+        console.warn(`[Annotask] wireframe ${mode} for ${taskId} failed:`, err)


+const EVIDENCE_CLASSIFICATIONS: readonly string[] = ['literal', 'bound', 'loop-literal', 'unknown']
+const ANCHOR_POSITIONS: readonly string[] = ['before', 'after', 'append', 'prepend']
+
+export function emptyDesignSessionDocument(sessionId: string): DesignSessionDocument {


+function makeSection(heading: string, lines: string[]): GuideSection {
+  const meta = GUIDE_SECTION_META[heading] ?? FALLBACK_META
+  // Strip leading/trailing blank lines and HTML comment placeholders.
+  const raw = lines.join('\n').trim().replace(/^<!--.*?-->$/gm, '').trim()


+
+function ensureSessionIdentity(): void {
+  if (!sessionId) {
+    sessionId = `ds-${Date.now()}-${Math.random().toString(36).slice(2, 7)}`


kurtstohrer and others added 30 commits May 12, 2026 17:00

kurtstohrer added 17 commits June 11, 2026 18:17

docs: wireframe data-binding + tool-consolidation kickoff handoff

c8cebbf

The D1-D6 plan this branch implemented — kept as the round's record, like the snapshot-wireframe handoff before it.

chore(release): v0.4.0

edc06a8

kurtstohrer merged commit febd1fe into main Jun 22, 2026
3 of 5 checks passed

kurtstohrer deleted the feat/embedded-agents branch June 22, 2026 22:49

github-advanced-security AI found potential problems Jun 22, 2026

View reviewed changes

kurtstohrer mentioned this pull request Jun 22, 2026

fix(test): Node-20-compatible readdir instead of fs.globSync (unblocks CI + publish) #51

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.4.0 — release hardening for wireframes + embedded agents#50

v0.4.0 — release hardening for wireframes + embedded agents#50
kurtstohrer merged 47 commits into
mainfrom
feat/embedded-agents

kurtstohrer commented Jun 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants