Monitor and wake agent sessions from the CLI (sessions watch / wake / list)

## Problem

Today `controller sessions start <project> --worktree <id> --message <text>` (#190) is a **fire-and-forget** primitive. The CLI kicks off the agent run, prints `{ sessionId, url }`, and exits. There is no symmetric way for the same agent (or any other agent) to:

1. **Observe** what another session is doing — is it running? which provider? what events has it emitted?
2. **Wait** for a long-running run to finish without holding a turn open for the entire duration.
3. **Wake** a session with a follow-up message at a later point, including arbitrarily in the future.

The pieces are already wired on the server:

- The runtime map in `server/lib/session-runtime.ts` already tracks `active`, `provider`, `projectId`, `worktreeId`, plus the child process and pending approvals. It is exposed via `GET /api/runtimes` (bulk) and `GET /api/projects/:id/sessions/:sessionId/runtime` (per-session). The React UI already consumes both.
- Events persist to `<orchestratorHome>/projects/<name>-<hash>/events/<sessionId>.jsonl` (`server/lib/sessions.ts`) and are read back via `GET /api/projects/:id/sessions/:sessionId/events`. The headless `advanceSessionQueue` flow (#113) already replays queued messages on a clean run completion without any client attached.
- The per-session message queue (`session-queue.ts`) is already CRUD-exposed at `/api/projects/:id/sessions/:sessionId/queue[/messageId]`.

What is missing is the **CLI surface** that lets an agent (or another script) reach all of this. Today the only session-aware CLI surface is `sessions start`, so an agent can spawn work but cannot supervise it.

> **Note on `--delay`**: a generic wakeup primitive (cron math + a shared tick loop + consumer registration) is being built separately in #243 for user-facing schedules. The `--delay` flag in item 5 below should plug into that shared loop instead of inventing its own wakeup mechanism. That means #219 will wait for #243's PR 1 (the wakeup loop itself) to land first, then implement `--delay` on top of it. See the "Related" section.

## Concrete motivation — "run a half-hour script, then keep going"

A common pattern we would like to support:

```sh
# Kick off a long-running build.
controller sessions start coding-orchestrator \
  --worktree <w> --message "Run ./big-build.sh and summarize the failures"

# Time passes. The agent's own turn has long since ended, or it is now
# doing other work in a different session.
controller sessions watch coding-orchestrator <sessionId> --until terminal
#   blocks until run.completed / run.failed / run.cancelled, then prints
#   a one-line summary + exit code.

# Or wake it later (from the same or a different agent) with the next step.
controller sessions wake coding-orchestrator <sessionId> --message "Build is done; now deploy."
```

The wake is the same primitive as the existing in-UI "send while running" path — `POST /api/projects/:id/sessions/:sessionId/queue` + `advanceSessionQueue` — so this does not introduce a new execution model. It just makes the existing one reachable from the CLI.

## Proposed surfaces

All under the existing `controller` CLI (`cli/controller`) so they live next to `sessions start`, mirror its argument style, and inherit the existing `CONTROLLER_SERVER_URL` resolution + project/worktree resolvers from #190.

1. **`controller sessions list <project> [--worktree <id>] [--include-archived]`**
   Wraps `GET /api/projects/:id/sessions`. The server already returns `SessionSummary[]` (metadata without message history, see `getSessionSummaries`). Print `id`, `title`, `status`, `provider`, `lastActiveAt`. Lets an agent enumerate its own past work or check what is running.

2. **`controller sessions status <project> <sessionId>`** *(optional, can fold into `runtime`)*
   Wraps `GET /api/projects/:id/sessions/:sessionId/runtime`. Prints the runtime snapshot: `active`, `provider`, `projectId`, `worktreeId`. Quick "is it still running?" probe.

3. **`controller sessions watch <project> <sessionId>`** (the headline new surface)
   Two modes:

   - `--until terminal` (default): long-poll/SSE on a new `GET /api/projects/:id/sessions/:sessionId/wait` route. The server watches the runtime map (the same data `advanceSessionQueue` already uses) and resolves when `run.completed` / `run.failed` / `run.cancelled` lands for that session, or when the child process exits. Prints a one-line summary + the exit code and exits with the same exit code (so agents can `if ! controller sessions watch ...`).
   - `--tail [N]`: prints the last `N` events (default 20) and exits. Lets an agent quickly catch up after re-entering a session, or after coming back from a delay. Wraps `GET /api/projects/:id/sessions/:sessionId/events` and uses the existing `dedupeUserMessageEvents` from `routes/sessions.ts`.

4. **`controller sessions wake <project> <sessionId> --message <text>`**
   Wraps `POST /api/projects/:id/sessions/:sessionId/queue`. Writes a `QueuedMessage` (`session-queue.ts`) using the same `{ text, provider, model, mode, ... }` shape that the UI queue uses, so the existing `advanceSessionQueue` picks it up unchanged on clean completion. Resolves to the new `messageId` and exits.

5. **`controller sessions wake <project> <sessionId> --message <text> --delay <duration>`** *(follow-up; depends on 4 and on #243's wakeup loop)*
   Writes a `{ runAt: <ISO>, ... }` envelope onto the existing session queue (i.e. extends `QueuedMessage` with an optional `runAt`). A new **wakes consumer** registered on the shared wakeup loop from #243 (`server/lib/scheduler.ts`) checks for due items on every tick and delivers them via the same `advanceSessionQueue` path. Delivery semantics, cross-restart behavior, and race protection inherit from #243's design (lock-then-mark-then-detach, 30s tick interval). Implementation note: the queue file format changes from a flat `QueuedMessage[]` to allow `runAt` on individual items; non-delayed items get `runAt: null` and behave exactly as today. Should be filed separately from 1–4 if we want to keep the first PR reviewable.

## Why this fits the current architecture

- The runtime map and event log are **already the source of truth** for the React UI sidebar and SessionView. The CLI surfaces only add a read/write path; they do not add new state.
- `advanceSessionQueue` is **already server-driven and client-independent** — a headless wake (no SSE client attached) drains the queue to completion. So `sessions wake` from another shell, cron, or agent works without any UI open.
- The CLI install path and project/worktree resolution are **already abstracted** in `cli/controller` (`controllerCliInstalledPath`, `resolveProjectId` from #190). Adding surfaces is `parseX(argv)` + `runX(argv, serverUrl)` + a server route or wrapper.
- The agent preamble (`server/lib/agent-preamble.ts`) and the agent system prompt already document the absolute CLI install path, so agents will discover these new subcommands via the same channel.

## Non-goals

- Not exposing the child process handle or letting the agent kill a sibling session. `controller sessions stop` can be a separate surface (and `POST /sessions/:id/stop` already exists on the server).
- Not changing the runtime map itself. The map is server-internal; the CLI surfaces read snapshots, not state.
- Not building a generic scheduler / cron system in this issue. That work lives in #243 (user-facing schedules) and we consume its wakeup loop for `--delay` rather than duplicating it.

## Open questions

- Should `sessions watch --until terminal` use SSE on the server, or short-poll `GET /runtime` + `GET /events`? SSE keeps the cost on the server (push when terminal lands); polling is simpler but chattier. SSE matches the existing pattern (every other live surface uses SSE), but the server route does not exist yet — would need a new endpoint, or reuse the existing `/events` SSE if we add a `?wait=terminal` mode.
- Should `sessions wake` deduplicate identical follow-ups? The existing queue is just a list, so two identical `--message`s will both replay.
- For `--delay`: do we need #243's wakeup loop to actively poll due items even when no run is active (i.e. across a full idle period), or is "deliver on the next natural run" acceptable? #243's design already answers this for schedules (yes, the loop ticks regardless of activity) — `--delay` inherits the same behavior for free.

## Acceptance criteria

- [ ] `controller sessions list <project>` prints the session list with status + provider.
- [ ] `controller sessions watch <project> <sessionId> --until terminal` blocks until the run terminates and exits with the run's exit code.
- [ ] `controller sessions watch <project> <sessionId> --tail` prints the last N events.
- [ ] `controller sessions wake <project> <sessionId> --message <text>` enqueues a follow-up that runs on the existing `advanceSessionQueue` path; verified by starting session A, kicking off a long tool call, waking with a follow-up from another shell, observing that the follow-up replays headlessly when the first run completes.
- [ ] `controller sessions wake <project> <sessionId> --message <text> --delay <duration>` enqueues a deferred follow-up; once #243's wakeup loop lands, this flag is implemented as a consumer of that loop (not its own wakeup mechanism).
- [ ] All surfaces work with the absolute install path (`~/coding-orchestrator/bin/controller`) and inherit the existing `CONTROLLER_SERVER_URL` resolution.
- [ ] New server routes (if any) get tests; CLI parsing gets unit tests under `cli/__tests__/`.

## Related

- Builds on: #190 (`sessions start`).
- Blocks on #243 for the `--delay` flag (PR 1, the wakeup loop itself, must land first). PRs 1–4 of #219 (`list`, `status`, `watch`, `wake` without `--delay`) do not depend on #243 and can land in parallel.
- Shares the wakeup primitive with: #243 (`Schedule future sessions with optional repeat`) — schedules and wakes are two consumers of the same `server/lib/scheduler.ts` loop.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Monitor and wake agent sessions from the CLI (sessions watch / wake / list) #219

Problem

Concrete motivation — "run a half-hour script, then keep going"

Proposed surfaces

Why this fits the current architecture

Non-goals

Open questions

Acceptance criteria

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Monitor and wake agent sessions from the CLI (sessions watch / wake / list) #219

Description

Problem

Concrete motivation — "run a half-hour script, then keep going"

Proposed surfaces

Why this fits the current architecture

Non-goals

Open questions

Acceptance criteria

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions