Skip to content

feat: wire execution model runtime through tasks and sessions#36

Merged
moose-lab merged 4 commits into
mainfrom
feat/execution-model-task-sessions
Jun 5, 2026
Merged

feat: wire execution model runtime through tasks and sessions#36
moose-lab merged 4 commits into
mainfrom
feat/execution-model-task-sessions

Conversation

@moose-lab

Copy link
Copy Markdown
Owner

这个分支主要做什么

  • 让 DevLog 在 Tasks 和 Sessions 中默认使用 Settings 里配置好的 Execution & model,不再在任务执行、重试、会话跟进时隐式回落默认 runtime。
  • 将 runtime payload 贯穿 task execute、task retry、session create、session follow-up,并在会话继续时保留已存储的 provider/model/CLI 元数据。
  • 只在当前请求中注入 transient browser API key 或 local CLI env,避免把浏览器侧密钥持久化到 session metadata。
  • 新增 task retry prompt/request helper,并扩展 task-launch E2E 配置,支持 local CLI、BYOK provider 和 legacy backend env-var 模式。
  • 强化 Codex CLI connection test:允许慢 auth refresh,并正确解析 Codex JSON event 中最终的 agent message,而不是把 reconnect/error 状态当成 sample。

验证

  • Pre-commit hook: git diff --cached --check
  • Pre-commit hook: bun run typecheck
  • Pre-commit hook: TZ=Asia/Shanghai bun run test,185/185 passed
  • Manual: bun run build passed
  • Manual: real Codex connection-test route returned ok: true, sample: "ok"
  • Manual: real Codex Tasks/Sessions E2E passed with task F05CA9BDEC32F08A and session 4e33454d32e0cc56

Propagate the Settings-derived runtime payload through task launch, retry, and session continuation so Tasks and Sessions use the selected execution mode/model instead of falling back implicitly.

Preserve stored session runtime metadata while injecting only transient browser API keys or local CLI env at send time, and add E2E/runtime helpers for local CLI, BYOK provider, and legacy backend env modes.

Harden Codex connection testing for slow auth refresh and JSON event parsing, with focused tests for task retry, runtime persistence, UI readiness gates, and E2E config.
@claude

claude Bot commented Jun 2, 2026

Copy link
Copy Markdown

Claude finished @moose-lab's task in 2m 55s —— View job


PR Review: feat: wire execution model runtime through tasks and sessions

Todo

  • Load tools and read CLAUDE.md
  • Review core logic: task-retry.ts, task-retry-request.ts, session-runtime-auth.ts
  • Review API routes: retry route, sessions route, execute route
  • Review agent-settings.ts and api-session-runtime.ts
  • Review UI components and hooks
  • Review test coverage
  • Post findings

Findings

  • [must-fix] Non-atomic task state update on spawn failure — retry/route.ts:153–169 and execute/route.ts:151–173

    Both routes insert the session, update the task to status='in_progress' and set session_id, then attempt to spawn the process. If processManager.sendMessage throws, only the session row is reverted to 'failed'; the task row remains permanently in_progress pointing at a failed session. The task cannot be retried or re-executed because neither the execute route (status !== 'todo' && status !== 'blocked') nor presumably the retry route will accept an in_progress task. To fix, also revert the task on spawn failure:

    // after: db.prepare("UPDATE sessions SET status = 'failed' ...").run(sessionId);
    db.prepare(
      "UPDATE tasks SET status = 'review', session_id = ? WHERE id = ?"
    ).run(task.session_id ?? null, taskId); // restore prior session_id

    For the execute route, restore status='todo' and clear worktree_name/session_id (or to 'blocked' if worktree creation already succeeded). Fix this →

  • [must-fix] localCliAgentValid is computed before localCliAgentEnv but its intent is subtly inverted — session-runtime-auth.ts:296–299

    const localCliAgentValid =
      mode !== "local-cli" ||
      !input.local_cli_agent_id ||
      input.local_cli_agent_id.trim() === localCliAgentId;

    When input.local_cli_agent_id is a whitespace-only string (e.g. " "), input.local_cli_agent_id.trim() is "" while localCliAgentId is DEFAULT_LOCAL_CLI_AGENT_ID (non-empty), so localCliAgentValid becomes false even though the input was effectively absent and the sanitized ID is valid. This causes sanitizeModel and sanitizeReasoning to fall back to defaults instead of using the supplied agent_model/agent_reasoning. Add a || !input.local_cli_agent_id.trim() guard. Fix this →

  • [should-fix] sessions/route.ts:47 — unsafe cast of JSON body as Record<string, unknown>

    const { task_id, worktree_name, worktree_path, branch_name, prompt } =
      body as Record<string, unknown>;

    If the client sends a JSON array, body is unknown[] and all destructured fields are undefined. The !prompt / !worktree_path guards catch it safely, but the cast bypasses TypeScript's type system without a runtime check. Use the same payload && typeof payload === 'object' && !Array.isArray(payload) ? (payload as Record<…>) : {} pattern already used in the retry route. Fix this →

  • [should-fix] agent-settings.ts:896buildSessionContinuationRuntimePayload falls through for legacy "agent-api-key" mode, producing a session_auth_mode: "local-cli" transient payload

    For sessions stored with the legacy agent-api-key mode, the function returns { session_auth_mode: "local-cli", ... }. The backend correctly ignores the transient session_auth_mode (it reads from the stored row via resolveStoredSessionRuntimeAuthConfig), so this doesn't cause a runtime error. But it's misleading and could become a bug if the call-site logic ever inspects the returned mode to gate UI state. Add an explicit "agent-api-key" branch or fall-through comment.

  • [should-fix] process-manager.ts:638–643requeueLastUserMessage drops the original runtimeAuthInput

    The requeued message is pushed as { message: row.content } with no runtimeAuthInput. For local-CLI sessions this is fine because resolveStoredSessionRuntimeAuthConfig reads from the DB. However, for BYOK sessions the watchdog already skips restart (line 558), so this path is only reachable for local-CLI sessions. The invariant is load-bearing but undocumented — a brief comment here would prevent a future regression.

  • [nit] task-retry-request.ts is a 12-line function that just adds a .trim() and spreads. The added indirection and separate test (task-retry-request.test.ts) is overhead for what could be two inline statements at the call site in retry/route.ts.

  • [nit] quality-gates.test.ts:39–68 — the SQL column-count gate parses INSERT columns and VALUES by splitting on ,. This will silently miscount if a future value expression contains a comma (e.g. a function call or sub-select). Document the assumption or use a stricter extractor.

  • [nit] agent-connection-test.ts:128–148extractTextFromJsonEvent correctly skips type === "error", but still captures record.content for any unrecognized event type. A Codex "reconnecting" event with a stray content field would pollute stdout. Consider only capturing text from explicitly handled event types (allow-list) rather than via a catch-all record.text ?? record.deltaContent ?? record.content.


Observations (no action needed)

  • The resolveStoredSessionRuntimeAuthConfig design correctly strips anthropic_api_key and local_cli_agent_env from persisted data and supplies them transiently — this is the right way to prevent browser-side keys from being stored in the DB.
  • The stripUnlessCustomBaseUrl helper in session-runtime-auth.ts:472–492 correctly avoids leaking ANTHROPIC_API_KEY / OPENAI_API_KEY to child processes when no custom base URL is set.
  • Test coverage is good: the new task-retry.test.ts, session-runtime-auth.test.ts, api-session-runtime.test.ts, and agent-settings.test.ts files provide meaningful assertions on the new logic paths.
  • buildSessionContinuationRuntimePayload correctly uses the stored session's provider/model metadata rather than the current settings when resuming — this is the desired behavior.

Verdict: REQUEST_CHANGES

The non-atomic task rollback on spawn failure (must-fix #1) is the primary concern: it can leave tasks permanently stuck in in_progress. The whitespace guard for localCliAgentValid (must-fix #2) is a correctness edge case in input sanitization. Everything else is should-fix or nit level.

Comment thread scripts/task-launch-e2e.mjs Fixed
@claude

claude Bot commented Jun 4, 2026

Copy link
Copy Markdown

Claude finished @moose-lab's task in 3m 36s —— View job


PR Review: feat: wire execution model runtime through tasks and sessions

Todo

  • Load tools
  • Read CLAUDE.md and check repo setup
  • Diff against origin/main and check if prior must-fix items were addressed
  • Review new commit: e8cd96d feat: stream session agent activity
  • Post findings

Status of Prior Must-Fix Items

Both must-fix items flagged in the previous review remain unaddressed:

  • [must-fix feat: v0.3.0 — Core UX principles & query commands #1] Non-atomic task rollback on spawn failure — retry/route.ts:159–168, execute/route.ts:163–173 — unchanged. When processManager.sendMessage throws, the session row is reverted to 'failed' but the task row stays permanently in_progress with session_id pointing at the dead session.

  • [must-fix feat: v0.4.0 — Agent Status Line Integration #2] localCliAgentValid whitespace guard — session-runtime-auth.ts:296–299 — unchanged. A whitespace-only local_cli_agent_id trims to "", which differs from DEFAULT_LOCAL_CLI_AGENT_ID, making localCliAgentValid false when it shouldn't be.


New Findings (from e8cd96d feat: stream session agent activity)

  • [must-fix] sessions/route.ts:109–129 — same non-atomic rollback class as must-fix feat: v0.3.0 — Core UX principles & query commands #1, now also present in the standalone session POST route

    Lines 110–115 update the task to in_progress with the new session_id. Lines 118–129 revert only the session on spawn failure — the task update is not rolled back:

    if (task_id) {
      db.prepare("UPDATE tasks SET session_id = ?, status = 'in_progress' WHERE id = ?")
        .run(id, task_id);          // <-- committed
    }
    try {
      processManager.sendMessage(id, sessionPrompt, runtimeAuthInput);
    } catch (err) {
      db.prepare("UPDATE sessions SET status = 'failed' ...").run(id);
      // task row NOT reverted
      return NextResponse.json({ error: ... }, { status: 500 });
    }

    Add a task rollback inside the catch block parallel to the session revert. Fix this →

  • [should-fix] stream/route.ts — no project-level authorization check

    Every other session endpoint calls resolveProjectId(req) and scopes the DB query to project_id. The new stream endpoint takes only the raw id parameter with no ownership validation. A client that knows (or guesses) a session ID could subscribe to another project's session stream. Add a DB check: SELECT 1 FROM sessions WHERE id = ? AND project_id = ?. Fix this →

  • [nit] use-session-chat.ts:343 — replay messages use nextId() instead of the DB-assigned data.id

    The message event from replay includes data.id (the session_messages.id row). Using nextId() assigns a fresh client-side counter instead, so every reconnect produces new IDs for the same messages. React treats them as new nodes and re-renders the whole list. Use data.id ?? nextId() to prefer the stable DB ID.

  • [nit] session-stream-dedupe.ts:26 — status events are deduped by status value alone, not by full metadata

    key = "status:running" in the replay blocks a live {status: "running", pid: newPid} event if both share the same status string. In practice the buffer window is very short so this is low-probability, but if the process restarts during replay the PID update would be silently discarded. Dedup status events by their full payload hash, or simply skip dedup for status type and let the frontend overwrite idempotently.


Observations (no action needed)

  • The subscribe-before-replay → drain-buffer-after-sync pattern in stream/route.ts is correct and race-free under Node.js's single-threaded event loop. The cleanedUp guard in cleanup() correctly prevents double-unsubscribe.
  • session-stream-replay.ts SQL: logLimit is sanitized by normalizeLogLimit before being interpolated into the LIMIT clause, and sessionId is a bound parameter — no injection risk.
  • mergeReplayMessagesWithQueuedPlaceholders handles repeated identical messages correctly via the count map.
  • The reconnect logic in use-session-chat.ts correctly replaces stale messages at the sync event rather than at reconnect, avoiding a blank flash between error and reload.

Verdict: REQUEST_CHANGES

Two prior must-fix items are still open (non-atomic task rollback in retry/execute routes; localCliAgentValid whitespace guard), and the new commit introduces the same rollback bug in the sessions POST route. The missing project authorization on the stream endpoint is a should-fix. Everything else is nit level.

Release linked in-progress tasks back to fail when their execution session fails, keep failed sessions as audit records, and allow failed tasks to start a new session.

Group queued and failed work in the Tasks board without merging their underlying DAG statuses, and guard task-bound session creation by project and executable status.
@claude

claude Bot commented Jun 4, 2026

Copy link
Copy Markdown

Claude finished @moose-lab's task in 3m 1s —— View job


PR Review: feat: wire execution model runtime through tasks and sessions

Todo

  • Load tools
  • Read CLAUDE.md and check repo setup
  • Check status of all prior must-fix items
  • Review new commit (66ad819 fix: make failed task sessions retryable)
  • Post findings

Status of Prior Must-Fix Items

✅ Fixed: Non-atomic task rollback on spawn failureretry/route.ts:159–172, execute/route.ts:167–180, sessions/route.ts:140–153

All three routes now call markSessionFailedAndReleaseLinkedTask(db, sessionId, ...), which correctly sets the session to 'failed' and rolls the linked task back to 'fail' in a single logical unit (guarded by task.status !== 'in_progress' || task.session_id !== sessionId). This resolves must-fix #1 from both previous reviews.

❌ Still open: localCliAgentValid whitespace guardsession-runtime-auth.ts:296–299

Unchanged from prior reviews. A whitespace-only local_cli_agent_id trims to "", which differs from DEFAULT_LOCAL_CLI_AGENT_ID, making localCliAgentValid false incorrectly, causing model/reasoning to fall back to defaults. Fix this →


Findings (new commit)

  • [should-fix] stream/route.ts — no project-level authorization check (still open from prior review)

    The stream endpoint still reads id from params with no call to resolveProjectId and no ownership check against the sessions table. Any client that guesses a session ID can subscribe to another project's event stream. Add a project-scoped DB check before subscribing:

    const projectId = resolveProjectId(_req);
    const owned = db.prepare(
      "SELECT 1 FROM sessions WHERE id = ? AND project_id = ?"
    ).get(id, projectId);
    if (!owned) return new Response("Forbidden", { status: 403 });

    Fix this →

  • [should-fix] sessions/route.ts:51 — unsafe body as Record<string, unknown> cast (still open from prior review)

    An array body silently makes all destructured fields undefined. The !prompt / !worktree_path guards catch the hard cases, but the cast bypasses TypeScript's type narrowing. Use the same pattern as the retry route: body && typeof body === 'object' && !Array.isArray(body) ? (body as Record<string, unknown>) : {}. Fix this →


Observations (no action needed)

  • markSessionFailedAndReleaseLinkedTask is well-designed: the guard task.status !== 'in_progress' || task.session_id !== sessionId is idempotent and safe to call from multiple code paths without risk of double-rollback.
  • Adding 'fail' to EXECUTABLE_TASK_STATUSES (task-status-flow.ts:36–40) is the right call — tasks that fail on first spawn now have a clear retry path without needing a separate worktree.
  • validateTaskSessionLaunch (task-lifecycle.ts:102–124) correctly gates standalone session creation to task-executable statuses, including project-scoping the lookup.
  • The execute/route.ts worktree conflict handling (line 92–109) correctly recovers if the worktree was already created in a prior failed execution attempt.
  • markSessionFailedAndReleaseLinkedTask intentionally sets fail (not the pre-execution status like 'todo') — correct, since 'fail' is now executable.

Verdict: REQUEST_CHANGES

The stream endpoint authorization gap is a security concern. The localCliAgentValid whitespace bug is a correctness edge case. Both carry over from previous reviews; the core task-rollback correctness issue has been fixed.

@claude

claude Bot commented Jun 4, 2026

Copy link
Copy Markdown

Claude finished @moose-lab's task in 1m 46s —— View job


PR Review: feat: wire execution model runtime through tasks and sessions

Todo

  • Load tools
  • Read CLAUDE.md and check repo setup
  • Diff latest commit (7adadda fix: harden session runtime followups) vs origin/main
  • Check status of all prior open items
  • Review new changes
  • Post findings

Status of Prior Open Items

All open items from previous reviews are now resolved:

✅ Fixed: localCliAgentValid whitespace guardsession-runtime-auth.ts:299

!input.local_cli_agent_id.trim() was added as an OR guard. A whitespace-only local_cli_agent_id now correctly falls through to the default agent, and the new test ("whitespace-only local CLI agent id is treated as absent") confirms the behavior end-to-end.

✅ Fixed: Stream endpoint missing project-level authorizationstream/route.ts:14–22

resolveProjectId(_req) is now called at the top of the GET handler, followed by a DB ownership check (SELECT 1 FROM sessions WHERE id = ? AND project_id = ?). Returns 403 if the session doesn't belong to the caller's project. A quality-gate test also locks this in.

✅ Fixed: Unsafe body cast in sessions/route.tssessions/route.ts:46–51

The body as Record<string, unknown> cast is replaced with a proper bodyRecord narrowing check (body && typeof body === 'object' && !Array.isArray(body) ? (body as Record<...>) : {}). The !prompt / !worktree_path guards are also tightened to typeof prompt !== "string" || !prompt.trim(), rejecting whitespace-only inputs explicitly. Quality-gate test added.

✅ Fixed: CodeQL alert — clear-text logging of agent_api_key_env_varscripts/task-launch-e2e-config.mjs:112

The env var value is no longer logged; replaced with keyEnv=configured/missing.

✅ Fixed (nit): Replay messages now use stable DB IDsuse-session-chat.ts, createReplayChatMessage

createReplayChatMessage now uses data.id ?? nextId(), preferring the persisted session_messages.id. The function is extracted and tested separately.

✅ Fixed (nit): Status event dedupsession-stream-dedupe.ts:26

case "status" now returns null instead of status:${event.status}. Status events are no longer deduplicated during replay — live status events can't be suppressed by a same-status replay event.


Findings (new commit)

No new issues found. All changes are well-scoped to the prior findings and the new logic is clean.


Observations (no action needed)

  • The process-manager.ts:638–643 note from a prior review (undocumented invariant that requeue only runs for local-CLI sessions) remains without a comment, but this is a nit-level concern and not blocking.
  • The task-retry-request.ts indirection nit from the first review is still present but also non-blocking.
  • The quality-gate tests for stream authorization and sessions POST narrowing are a good structural pattern — they enforce the behavior at the source level without needing integration tests.

Verdict: APPROVE

All must-fix and should-fix items across four review rounds are now resolved. The implementation is correct and well-tested.

@moose-lab moose-lab merged commit 56892fe into main Jun 5, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants