Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,18 @@

All notable changes to this project are documented here. Versions follow [Semantic Versioning](https://semver.org/). Dates are ISO 8601.

## [0.4.1] - 2026-06-23

### Fixed
- **The selected provider now actually applies the task.** Built-in agent personas hard-coded `claude-local`, so every apply/seed run spawned `claude` regardless of the provider you picked in Settings → Providers — surfacing as `spawn claude ENOENT` when claude wasn't installed (e.g. you'd selected Copilot). Built-in personas now inherit the global active provider; an explicit per-persona pin in `.annotask/agents.json` still wins. Verified live across all four CLIs (claude/codex/opencode/copilot).
- **Per-agent provider/model pins persist.** Setting a persona's model in Settings → Agents was silently wiped whenever the provider's live model catalog couldn't be enumerated (Copilot's interactive-only picker is the canonical case) — and a saved value displayed as empty. The model field now keeps and shows the saved value; a stale id from a *different* provider is cleared on provider change instead, never on catalog-fetch failure.
- **Reloading the page no longer destroys an in-flight apply.** An applying CLI was bound to the browser tab: a reload tore down the SSE, killed the child mid-edit, and reverted the task to `pending`. The spawn server now keeps a task-bearing run alive across a client disconnect (detach grace) and finalizes it on the child's own exit — a clean exit lands the task in `review`, an interrupted/failed run reverts to `pending`. The client no longer reverts the task on `pagehide` and warns before an accidental reload. A stale orphan-finalize can no longer clobber a newer run for the same task.
- **Auto-run reliability.** The headless auto-run driver logs when its single-run guard blocks a drain (previously silent), and bounds each run so a hung provider can't pin the queue and stall every later task.
- **Conversation rendering.** Agent messages now style the full Markdown surface (lists, headings, blockquotes, links, tables) instead of only paragraphs/code; wide tool output and long tokens no longer scroll the whole panel (`min-width: 0` + code wrapping); the "agent paused for input" banner label meets contrast; and rendered links open in a new tab with `rel="noopener"`.

### Changed
- **Seed prompt carries the task inline.** Apply runs now embed a compact `Task grounding` block (file/line/component/route + per-type context, via the shared task-summary) in the seed prompt, so the agent applies directly instead of reflexively calling `annotask_get_task`. Heavy fields (screenshot, rendered HTML, interaction history) stay behind their MCP tools.

## [0.4.0] - 2026-06-22

### Fixed
Expand Down
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "annotask",
"version": "0.4.0",
"version": "0.4.1",
"description": "Visual UI design tool for web apps. Make changes in the browser and generate structured reports that AI agents can apply to source code. Works with Vue, React, Svelte, SolidJS, and plain HTML via Vite or Webpack.",
"license": "MIT",
"type": "module",
Expand Down
33 changes: 33 additions & 0 deletions src/server/__tests__/agent-spawn-http.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -300,6 +300,39 @@ describe('agent-spawn over HTTP (real middleware + real spawn handler)', () => {
child.finish(143) // simulate the SIGTERM landing so the run drains
})

it('keeps the child alive on client disconnect when the run carries a taskId (reload survival)', async () => {
const before = spawned.length
const clientReq = http.request({
method: 'POST',
hostname: 'localhost',
port: serverPort(),
path: SPAWN_PATH,
headers: { 'Content-Type': 'application/json' },
})
clientReq.on('error', () => { /* expected after destroy */ })
clientReq.on('response', (res) => {
res.on('data', () => { /* keep the stream flowing */ })
res.on('error', () => { /* expected after destroy */ })
})
clientReq.end(JSON.stringify({ cli: 'claude', args: [], taskId: 'task-reload' }))

await waitFor(() => spawned.length > before, 'child spawn')
const child = spawned[spawned.length - 1]
expect(child.killed).toBe(false)

clientReq.destroy()
// Apply runs get a detach grace so a page reload doesn't destroy in-flight
// work — the child must still be alive shortly after the disconnect (the
// no-taskId case above is killed immediately).
await new Promise((r) => setTimeout(r, 80))
expect(child.killed).toBe(false)
expect(agentSpawn.registry.taskRunning('task-reload')).toBe(true)

// The child exiting on its own drains the run (afterEach also guards this).
child.finish(0)
await waitFor(() => agentSpawn.registry.size() === 0, 'run drains after child exit')
})

describe('ANNOTASK_MAX_PERMISSION cap at the route level', () => {
it('403s a bypass spawn under cap=default — canonical flag AND synonym spellings', async () => {
process.env.ANNOTASK_MAX_PERMISSION = 'default'
Expand Down
36 changes: 36 additions & 0 deletions src/server/__tests__/agent-spawn.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -278,6 +278,42 @@ describe('run registry — per-task dedup + orphan hook', () => {
await p
expect(ended).toEqual([]) // no taskId → no orphan reconciliation
})

it('reports a clean self-exit as { killed:false, exitCode:0 } so the finalizer can land review', async () => {
const ends: Array<{ killed: boolean; exitCode: number | null }> = []
const child = new FakeChild()
const handler = createAgentSpawnHandler({ spawnImpl: fakeSpawn(child), onRunEnd: (_id, info) => ends.push(info) })
const p = handler.handleSpawn(fakeReq(), fakeRes(), { cli: 'claude', args: [], taskId: 'task-clean' }, '/tmp')
child.finish(0)
await p
expect(ends).toEqual([{ killed: false, exitCode: 0 }])
})

it('does NOT kill an apply run (taskId) on client disconnect — keeps it alive across a reload', async () => {
const child = new FakeChild()
const handler = createAgentSpawnHandler({ spawnImpl: fakeSpawn(child) })
const req = fakeReq()
const p = handler.handleSpawn(req, fakeRes(), { cli: 'claude', args: [], taskId: 'task-detach' }, '/tmp')
;(req as unknown as EventEmitter).emit('close')
// The detach grace keeps the child alive (a chat run would be killed — next test).
expect(child.killed).toBe(false)
expect(handler.registry.taskRunning('task-detach')).toBe(true)
// Its own clean exit drains the run and clears the grace timer.
child.finish(0)
await p
expect(handler.registry.taskRunning('task-detach')).toBe(false)
})

it('kills a chat run (no taskId) immediately on client disconnect', async () => {
const child = new FakeChild()
const handler = createAgentSpawnHandler({ spawnImpl: fakeSpawn(child) })
const req = fakeReq()
const p = handler.handleSpawn(req, fakeRes(), { cli: 'claude', args: [] }, '/tmp')
;(req as unknown as EventEmitter).emit('close')
expect(child.killed).toBe(true)
child.finish(143)
await p
})
})

describe('origin policy', () => {
Expand Down
78 changes: 65 additions & 13 deletions src/server/agent-spawn.ts
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,18 @@ const MAX_SPAWN_DURATION_MS = 15 * 60_000
* at once. Excess spawns are refused with a `too_many_agents` error.
*/
const MAX_CONCURRENT_SPAWNS = 4
/**
* Grace window to keep a CLI child alive after its client SSE disconnects,
* WHEN the run is applying a task (carries a taskId). A page reload tears down
* the SSE; without this the child is SIGTERM'd mid-edit and the in-flight work
* is lost. With it, the child keeps running and the server finalizes the task
* on the child's own exit (see `onRunEnd` → index.ts: clean exit → review,
* otherwise → pending). Chat runs (no taskId) are bound to their tab and are
* still killed the instant the client disconnects. Kept beneath
* MAX_SPAWN_DURATION_MS and below the boot-reclaim age so a client that never
* returns is still cleaned up promptly.
*/
const DETACH_GRACE_MS = 90_000

/**
* PATH passed to spawned CLIs. Appends `ANNOTASK_HOST_PATH` (if set) so
Expand Down Expand Up @@ -143,12 +155,16 @@ export interface AgentSpawnOptions {
* the orchestrating tab closed mid-run) — it grace-checks "still in_progress?"
* so a normal completion (client about to set `review`) is a no-op.
*
* `info.killed` is true when WE terminated the child (client disconnect,
* explicit abort, or the duration backstop) and false when it exited on its
* own — lets the finalizer word its reason honestly instead of always
* blaming a closed tab.
* `info.killed` is true when WE terminated the child (explicit abort, the
* detach-grace expiry, or the duration backstop) and false when it exited on
* its own — lets the finalizer word its reason honestly instead of always
* blaming a closed tab. `info.exitCode` is the child's exit status (null when
* killed/signalled): a clean `0` from a child that exited on its own means
* the apply finished even though the client never finalized it (e.g. the tab
* reloaded mid-run), so the finalizer can land the task in `review` instead
* of wrongly reverting completed work to `pending`.
*/
onRunEnd?: (taskId: string, info: { killed: boolean }) => void
onRunEnd?: (taskId: string, info: { killed: boolean; exitCode: number | null }) => void
/** Test seam: inject a child-process factory (defaults to node:child_process spawn). */
spawnImpl?: typeof spawn
}
Expand Down Expand Up @@ -423,6 +439,13 @@ export function createAgentSpawnHandler(opts: AgentSpawnOptions = {}): AgentSpaw
}

let killed = false
// Set once the run has fully ended (child closed + cleanup running). Guards
// onClientGone so the `res.end()` in cleanup — which itself fires res
// 'close' — can't re-enter and arm a pointless detach timer on a dead run.
let runEnded = false
// Detach-grace timer: armed when an apply run's client disconnects, kills
// the child if no one returns within DETACH_GRACE_MS.
let detachTimer: ReturnType<typeof setTimeout> | null = null
// Signal the whole process group on POSIX (the child was spawned detached,
// i.e. as a group leader) so grandchildren the agent forked die with it.
// Falls back to the direct child.kill when there's no pid (spawn-time
Expand All @@ -436,9 +459,28 @@ export function createAgentSpawnHandler(opts: AgentSpawnOptions = {}): AgentSpaw
function killChild() {
if (killed) return
killed = true
if (detachTimer) { clearTimeout(detachTimer); detachTimer = null }
killGroup('SIGTERM')
setTimeout(() => { killGroup('SIGKILL') }, KILL_GRACE_MS).unref()
}
// Client SSE went away (tab closed, navigated, or RELOADED). For a chat run
// the child is bound to its tab — kill it. For an APPLY run (has a taskId)
// a reload would otherwise destroy in-flight edits, so keep the child alive
// for a grace window: it finishes and the server finalizes the task on its
// exit. The child keeps writing to a dead socket during the grace, which is
// harmless (write() swallows EPIPE). Kill on grace expiry.
// Capture the (narrowed) taskId in a local — TS doesn't carry the
// `typeof parsed === 'string'` narrowing into the nested closure below.
const runTaskId = parsed.taskId
let clientGone = false
function onClientGone() {
if (clientGone || runEnded) return
clientGone = true
if (!runTaskId) { killChild(); return }
if (detachTimer || killed) return
detachTimer = setTimeout(() => { killChild() }, DETACH_GRACE_MS)
detachTimer.unref?.()
}
active.set(runId, { child, kill: killChild, taskId: parsed.taskId })

// Pipe stdin if provided. Attach the async error listener BEFORE writing:
Expand All @@ -463,10 +505,11 @@ export function createAgentSpawnHandler(opts: AgentSpawnOptions = {}): AgentSpaw
}, KEEPALIVE_INTERVAL_MS)
keepalive.unref()

// Abort on client disconnect. Belt-and-suspenders: some proxies close
// the response socket rather than the request, so we listen on both.
req.on('close', () => { killChild() })
res.on('close', () => { killChild() })
// Client disconnect (incl. page reload). Belt-and-suspenders: some proxies
// close the response socket rather than the request, so we listen on both.
// For apply runs this starts the detach grace rather than an instant kill.
req.on('close', () => { onClientGone() })
res.on('close', () => { onClientGone() })

// Absolute duration backstop — kill a child that outlives the ceiling
// (e.g. its client died without closing the socket). Cleared on close.
Expand Down Expand Up @@ -523,6 +566,7 @@ export function createAgentSpawnHandler(opts: AgentSpawnOptions = {}): AgentSpaw
})
})

let exitCode: number | null = null
await new Promise<void>((resolve) => {
let resolved = false
const finish = () => { if (!resolved) { resolved = true; resolve() } }
Expand All @@ -532,6 +576,7 @@ export function createAgentSpawnHandler(opts: AgentSpawnOptions = {}): AgentSpaw
finish()
})
child.on('close', (code, signal) => {
exitCode = typeof code === 'number' ? code : null
// Flush trailing stdout/stderr (no terminating newline).
if (stdoutBuf.length > 0) write(res, 'stdout', stdoutBuf)
if (stderrBuf.length > 0) write(res, 'stderr', stderrBuf)
Expand All @@ -540,16 +585,23 @@ export function createAgentSpawnHandler(opts: AgentSpawnOptions = {}): AgentSpaw
})
})

// Mark ended BEFORE res.end() (which fires res 'close') so the disconnect
// handler can't arm a detach timer on this already-finished run.
runEnded = true
clearInterval(keepalive)
clearTimeout(maxDuration)
if (detachTimer) { clearTimeout(detachTimer); detachTimer = null }
active.delete(runId)
if (parsed.taskId && byTask.get(parsed.taskId) === runId) byTask.delete(parsed.taskId)
try { res.end() } catch { /* already ended */ }
// Report the run end so a task the client never finalized (orphaned by a
// closed tab) can be reconciled server-side. Fired for normal completions
// too — the handler grace-checks status, so those no-op.
// Report the run end so a task the client never finalized can be reconciled
// server-side. `killed` distinguishes a terminated child from one that
// exited on its own; `exitCode` lets the finalizer land a detached-but-
// completed apply (clean exit) in `review` instead of reverting it to
// `pending`. Fired for normal completions too — the handler grace-checks
// status, so a client that already finalized makes this a no-op.
if (parsed.taskId) {
try { opts.onRunEnd?.(parsed.taskId, { killed }) } catch { /* never let a sink break cleanup */ }
try { opts.onRunEnd?.(parsed.taskId, { killed, exitCode }) } catch { /* never let a sink break cleanup */ }
}
}

Expand Down
73 changes: 66 additions & 7 deletions src/server/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,7 @@ export function createAnnotaskServer(options: AnnotaskServerOptions): AnnotaskSe
if (res && typeof res === 'object' && 'error' in res) return
// eslint-disable-next-line no-console
console.warn(`[annotask] reconciled orphaned task ${taskId} → pending (${why})`)
void finalizeFrozenPartial(taskId)
if (before?.type === 'wireframe_apply') {
try { await releaseApplyTask(applyOptions, taskId) }
catch (err) {
Expand All @@ -134,18 +135,76 @@ export function createAnnotaskServer(options: AnnotaskServerOptions): AnnotaskSe
console.warn(`[annotask] orphan reconcile failed for task ${taskId}:`, err)
})
}
/**
* A detached apply run exited cleanly (code 0) but its client never flipped
* the status — the orchestrating tab reloaded/closed mid-run while the CLI
* (kept alive by the spawn detach-grace) finished and wrote the files. Land
* the task in `review` (the state the client would have set) so completed
* work isn't reverted to `pending` and re-applied. Guarded to in_progress so
* a returning client that finalizes first still wins.
*/
function finalizeDetachedReview(taskId: string): void {
void state.updateTask(
taskId,
{ status: 'review', resolution: 'Applied by the embedded agent; finalized server-side after the tab was closed or reloaded mid-run. Review the changes.' },
{ guard: (t) => (t.status === 'in_progress' ? null : 'task already finalized') },
).then((res) => {
if (res && typeof res === 'object' && 'error' in res) return
// eslint-disable-next-line no-console
console.warn(`[annotask] finalized detached run ${taskId} → review (clean exit)`)
void finalizeFrozenPartial(taskId)
}).catch((err) => {
// eslint-disable-next-line no-console
console.warn(`[annotask] detached review finalize failed for task ${taskId}:`, err)
})
}
/**
* Clear a stuck `isPartial:true` assistant turn left behind when the
* streaming client vanished mid-run (only the client run loop ever clears it,
* so without this the Conversation tab shows a turn frozen mid-stream
* forever). Best-effort.
*/
async function finalizeFrozenPartial(taskId: string): Promise<void> {
try {
const msgs = await taskThread.read(taskId)
for (let i = msgs.length - 1; i >= 0; i--) {
if (msgs[i].isPartial) {
await taskThread.update(taskId, msgs[i].id, { isPartial: false })
break
}
}
} catch (err) {
// eslint-disable-next-line no-console
console.warn(`[annotask] clearing frozen partial failed for task ${taskId}:`, err)
}
}
// The registry reports every run end; we grace-check a moment later (a normal
// completion's client review/pending PATCH lands well within this window, so
// it no-ops) and, if the task is still `in_progress`, reconcile it. `killed`
// distinguishes an interrupted run (we terminated the child) from a child
// that exited on its own while the client failed to finalize — for honest
// logging; both reconcile to `pending`.
// it no-ops) and, if the task is still `in_progress`, finalize it. A child
// that exited on its own with code 0 finished the apply (the client just
// never flipped status — e.g. its tab reloaded) → land in `review`. Anything
// else — a non-zero exit, or a child WE killed (detach-grace expiry, abort,
// duration backstop) — reverts to `pending` so it stays retryable.
// wireframe_apply always takes the pending path: its apply-batch lifecycle is
// reconciled there and a server-side review flip would bypass it.
const ORPHAN_FINALIZE_GRACE_MS = 12_000
function scheduleOrphanFinalize(taskId: string, info: { killed: boolean }): void {
function scheduleOrphanFinalize(taskId: string, info: { killed: boolean; exitCode: number | null }): void {
setTimeout(() => {
const task = state.getTasks().tasks.find((t: { id?: string; status?: string }) => t?.id === taskId)
const task = state.getTasks().tasks.find((t: { id?: string; status?: string; type?: string }) => t?.id === taskId)
if (!task || task.status !== 'in_progress') return
reconcileOrphanedTask(taskId, info.killed ? 'run interrupted' : 'client never finalized')
// A NEWER run already owns this task (a re-spawn landed in the window
// between this run's child-exit — which cleared the byTask reservation —
// and this delayed callback). Our finalize is stale: flipping the task
// now would clobber the live run mid-edit. Bail and let that run finalize
// itself. The finalizer is otherwise run-identity-unaware (it keys only
// on taskId + status), so this guard is what prevents the clobber.
if (agentSpawn.registry.taskRunning(taskId)) return
const cleanCompletion = !info.killed && info.exitCode === 0 && task.type !== 'wireframe_apply'
if (cleanCompletion) {
finalizeDetachedReview(taskId)
} else {
reconcileOrphanedTask(taskId, info.killed ? 'run interrupted' : 'client never finalized')
}
}, ORPHAN_FINALIZE_GRACE_MS).unref()
}
const agentSpawn = createAgentSpawnHandler({ onRunEnd: scheduleOrphanFinalize })
Expand Down
Loading
Loading