Skip to content

v0.8.61: community harvest + freeze fix + WhaleFlow foundation layer (WIP, for review)#3225

Merged
Hmbown merged 83 commits into
mainfrom
codex/v0.8.61
Jun 14, 2026
Merged

v0.8.61: community harvest + freeze fix + WhaleFlow foundation layer (WIP, for review)#3225
Hmbown merged 83 commits into
mainfrom
codex/v0.8.61

Conversation

@Hmbown

@Hmbown Hmbown commented Jun 14, 2026

Copy link
Copy Markdown
Owner

v0.8.61 — community harvest + launch-blocker fix + WhaleFlow foundation layer

Draft / for review. This is the assembled v0.8.61 work on codex/v0.8.61 (28 commits over main). It is intentionally not merge-ready yet — the version bump is still a local working-tree change, and the new foundation modules are deliberately unwired (#![allow(dead_code)]) pending their follow-up wiring passes.

👏 New community contributors (6) — harvested with authorship preserved + credited on each PR

PR Author What
#3201 @mvanhorn revive non-DeepSeek cost tracking → closes #3066
#3195 @cyq1017 telegram bridge keeps polling while turns stream (relates #2966)
#3220 @RobertEmprechtinger cap mobile event history (freeze fix)
#3199 @gaord PUT /v1/sessions engine-snapshot session save
#3197 @nightt5879 rename DEEPSEEK_BLUE → WHALE_ACCENT_PRIMARY (deprecated aliases kept) → closes #3069
#3221 @hongchen1993 exec honors DEEPSEEK_BASE_URL / DEEPSEEK_MODEL

(#3013 @cyq1017 verified already-implemented and credited.)

Launch blocker — sub-agent fanout freeze (#3216 / #2211)

  • Mechanism fix: the turn loop now observes cancellation between tool batches, so a runaway agent_open fanout is promptly interruptible instead of wedging the TUI.
  • Trigger fix: the base prompt no longer tells the model to spawn sub-agents "liberally" or advertises a 10/20 cap (the real effective launch limit is ~4); it now teaches deliberate, batch-and-poll fanout. (This was the "overlapping material" that drove the freeze.)
  • Full nonblocking/durable fanout is the designed follow-up (see docs/V0_8_61_EXECUTION.md).

Quick-fix issues closed

#3012 (auto-load global ~/.codewhale/instructions.md), #3068 (legacy .deepseek/ path audit doc), #3208 (release-artifact docs), plus wave-1 #3214 (branch-hygiene tool), #3188 (git identity in TUI status), #3076 (neutral provider ordering).

WhaleFlow foundation layer (the spine)

The orchestration pattern = WhaleFlow ≈ ultracode, native to CodeWhale with heterogeneous-model workers. Ten additive, tested foundation modules, not yet wired:
worker_profile (per-role permissions + model route + non-escalating parent→child derivation, #3217/#3211/#3213/#414/#426/#1186), goal_loop (persistent-objective decision core, wired into the continuation hook, #3215), record_thread_goal_usage (durable per-goal accounting), model_registry (#3071/#3073), provider_readiness (#3083), context_budget (#3086), provider_adapter (#3084), resource_telemetry (#2666), theme_override (#3074), request_tuning (#3024).

Docs

docs/V0_8_61_RELEASE_TRIAGE.md, docs/V0_8_61_ISSUE_COVERAGE.md (all 84 milestone issues → disposition + plan), docs/V0_8_61_EXECUTION.md (12 workstream clusters + the WhaleFlow spine).

Verification

cargo test -p codewhale-tui --bins4812 pass / 0 fail; codewhale-config/-protocol/-cli/-whaleflow/-state green; cargo fmt --all --check + git diff --check clean; scripts/release/check-versions.sh OK. (CI will run the full release build on this PR.)

Not in this PR / follow-ups

🤖 Generated with Claude Code

Hmbown and others added 28 commits June 14, 2026 08:12
…goal mode, PR triage)

Synthesizes four investigations into the release plan:
- Launch blocker #3216/#2211: TUI freeze root-caused at code level — global
  tool_exec_lock held across the agent_open flash-router model call in a serial,
  cancel-less, inline batch (mechanism), driven by prompt guidance that tells the
  model to spawn 'liberally' and advertises a 10/20 cap vs the real ~4 (trigger).
- Sub-agent vs Fleet overlap (~70% unified; agent_open not yet on the durable path).
- Goal Mode: three disconnected goal models; within-turn loop only; dead metering.
- PR/issue stewardship recommendations with credit (not executed).

Chunked-PR plan + exact push/PR/merge commands included. No GitHub state changed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The base prompt told the model to spawn sub-agents 'liberally' and advertised a
cap of '10 concurrent / ceiling 20', while the effective concurrent-launch limit
is ~4 (interactive_max_launch) — the rest queue. The parallel-first heuristic even
said to fire 'all agent_open calls in one turn'. That guidance is what drove the
six-sub-agent burst that wedged the TUI (#3216 / #2211): the model was coached into
exactly the high-fanout pattern the runtime can't yet absorb.

Reframe across all prompt representations (compiled constitution.md, legacy base.md,
constitution.yaml source) and docs/SUBAGENTS.md:
- 'use them liberally' -> 'use them deliberately; each is a real spawn, the win is a
  clean context, not free parallelism'
- correct the cap: ~4 execute at once, rest queue; open a small batch (~4), poll with
  agent_eval, open the next batch; max_concurrent (10/20) caps TRACKED agents, not
  parallel execution
- parallel-first bullet: 'all agent_open in one turn' -> 'a small batch (~4), then poll'

Pairs with the freeze mechanism fix (cancel arm + drop global tool lock across the
router model call). Prompt/docs only; no code-path change. tui bin compiles; all 90
prompt tests pass. The renderer (render_constitution.py) was not run — constitution.md
uses {placeholder} expansion that the YAML inlines, so the files were edited
consistently by hand to avoid a non-idempotent regeneration clobber.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…3216, #2211)

When the model emitted several non-parallel tool calls in one turn — the classic
case being six agent_open calls under /model auto — the turn loop ran them as
sequential Serial batches with NO cancellation check between them. Each agent_open
resolves a model route (the 4s flash router) while the global tool_exec_lock is
held, so the batch could occupy the engine task for ~6x4s with the UI unable to
interrupt it: a hard freeze where Esc/Ctrl+C did nothing.

Cancellation is in fact delivered out-of-band: EngineHandle::cancel_with_reason
locks the shared CancellationToken and cancels it directly (handle.rs), so the
token is already cancelled while the batch runs — the loop simply never looked.
The streaming and sub-agent-wait paths already race the token (turn_loop.rs
408/502/1107); the tool-batch loop did not.

Fix: check self.cancel_token at the top of the "for batch in batches" loop. Once
cancelled, stop launching further batches and record an interrupted result for
every remaining plan (Ok(ToolResult success:false), not Err — so it does not
inflate the step's error counters), keeping each tool_use paired with a
tool_result so the transcript stays well-formed for resume. The post-loop check
then ends the turn as Interrupted. This branch is a no-op on the normal
(non-cancelled) path.

Scope: this makes a runaway fan-out promptly cancellable (24s -> interrupt). It
does NOT by itself make the parent nonblocking during fan-out — detaching
agent_open onto the durable fleet-backed worker run (per docs/AGENT_RUNTIME.md
cutover rule) remains the larger #3216 follow-up, designed in
docs/V0_8_61_RELEASE_TRIAGE.md. Pairs with the prompt/cap-honesty chunk that stops
the model spawning the burst in the first place.

Verification: cargo fmt --all --check clean; new unit test
(cancel_batch_tests::interrupted_tool_result_is_a_non_error_unexecuted_marker)
passes; cargo test -p codewhale-tui --bins => 4695 pass. The single full-suite
failure (a tools::test_runner meta-test that spawns a nested cargo) is a
CARGO_TARGET_DIR test-isolation artifact of the local shared-target build — it
passes in isolation and on the unmodified base; unrelated to this change
(different module).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…sion save

Add a new PUT /v1/sessions endpoint that saves a thread's current engine state as a session, complementing the existing POST /v1/sessions which reconstructs messages from stored turn items.

The new endpoint asks the engine for its live session snapshot via a oneshot channel, so token counts and message ordering are authoritative rather than reconstructed. This matches TUI's build_session_snapshot behavior.

Changes:

- ops.rs: add SessionSnapshot struct and Op::GetSessionSnapshot variant

- engine.rs: handle GetSessionSnapshot in the engine loop

- engine/handle.rs: add get_session_snapshot() method

- runtime_threads.rs: expose get_engine() as public wrapper

- runtime_api.rs: add PUT /v1/sessions route and save_current_session handler

Also fixes the Greptile review issue where load_session errors were silently swallowed: only io::ErrorKind::NotFound falls back to creating a new session; other I/O errors (e.g. PermissionDenied) are now propagated.

Ref: #2808
Harvested-from: PR #3199 by @gaord
Three fixes so that 'codewhale --provider wanjie-ark --base-url <url>
--model auto exec ...' works without a wrapper script:

1. resolve_exec_model: fall back to CODEWHALE_MODEL/DEEPSEEK_MODEL env
   vars when the explicit arg is absent.

2. Exec command handler: read DEEPSEEK_BASE_URL and set it on the
   config before creating the client.

3. deepseek_base_url: try env_base_url_override() as a fallback before
   the provider default.

Harvested-from: PR #3221 by @hongchen1993
Refs: #3205
…#3012)

global_context_relative_paths() only honored AGENTS.md and the deprecated
WHALE.md globally; ~/.codewhale/instructions.md (and .agents/ + .deepseek/
variants) was project-level only. Add the three instructions paths ranked
between AGENTS.md (higher) and WHALE.md (lower), matching the documented
project-level precedence. The existing load loop, merge, and deprecation-warning
logic need no other change. Adds two tests (autoload+outranks-WHALE,
AGENTS-outranks-instructions).

Closes: #3012
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… issues)

Coverage matrix: ultracode triage of all 84 open milestone issues — disposition
(already-done 16 / quick-fix 3 / design 52 / defer 13) + a concrete plan per issue.

Execution plan: clusters the 52 design issues into 12 agent-owned workstreams with
sequencing + dependencies, and makes the architectural spine explicit — WhaleFlow is
the ultracode orchestration pattern realized inside CodeWhale, where workers are
heterogeneous model types (flash scouts, pro synthesis, per-role model routes). Goal
mode #3215 / durable fanout #3216 / fleet #3154 / profiles #3217 / swarm gate #3218
are facets of that one epic. Critical scope: multi-pass, merge only verified branches.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…3068)

Documents the consolidated state-dir resolver (config::resolve_state_dir /
ensure_state_dir, read-.codewhale-fallback-.deepseek / write-.codewhale) and a
keep/deprecate/remove decision per legacy reference. Decision: keep-as-fallback for
all; routing the remaining hardcoded sites through the resolver is a flagged follow-up.

Closes: #3068
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
#3208 was confusion, not a bug: codewhale-<platform> (bare) is what the npm wrapper
and in-app updater download; codewhale-<platform>.tar.gz bundles the same binaries +
install.sh for manual installs. Clarify both the generated GitHub release body and
INSTALL.md §6 so the Releases page is self-explanatory. Docs only; no pipeline logic.

Closes: #3208
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds scripts/release/branch-hygiene.sh: a read-only, dry-run-by-default tool
that makes the post-merge release state obvious and recommends safe branch
cleanup. It reports the current checkout, local+remote release tips, and
origin/main; flags branches whose tip is already contained in main or the
release branch as safe deletes; and lists branches with unique commits as
keep/review, naming the branch, unique-commit count, author(s), and reason.

Enforces the contributor-preservation policy: a branch is only ever a
maintainer-only safe/review item if every unique commit maps to Hunter
(via a built-in list + the canonical side of .mailmap). Any unique commit
from another contributor forces a KEEP and is never auto-deleted unless
already merged. Deletion is gated behind --prune/--prune-remote + a
confirmation (or --yes), and a diverged local/remote release tip exits
non-zero.

Adds a hermetic test (branch-hygiene.test.sh) that builds a synthetic repo
and asserts safe-delete detection, contributor preservation, mailmap
folding, the parked-checkout warning, prune behavior, and divergence
detection. Documents the exact commands as section 5b of the release
checklist.

Refs: #3214

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The footer git chip previously showed only the branch and rendered
nothing outside a git repo, so the status surface could read as an
empty "Repo:" label (issue #3188). Surface a concise, factual
workspace identity sourced strictly from workspace/git detection —
never from model narration or config text:

- Git repo: "Repo: <name> @ <branch>" (carries the cached
  "detached:<hash>" form for detached HEAD).
- Non-git cwd: "Repo: <name> (no git)" instead of hiding the chip.

Add width-aware formatting (`format_repo_identity`) that keeps the
repo identity over the branch under width pressure, then truncates the
name rather than collapsing to a bare prefix. The chip renders the
full identity and lets the footer widget clip to terminal width
(matching the prior branch-only chip). The truncation policy is
unit-tested with explicit widths.

Tests cover git repo, detached HEAD, non-git cwd, narrow-width
truncation priority, and a real-git integration check. The existing
two footer-branch tests are updated to the new identity contract.

Refs: #3188

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Provider browsing surfaces have historically led with whichever provider
sits first in ProviderKind::ALL (DeepSeek). Add a config-crate helper that
returns the built-in providers sorted alphabetically (case-insensitively) by
display name, giving model/provider UI a neutral order without DeepSeek being
hard-coded first.

The helper is purely additive: ProviderKind::ALL and all_providers() keep
their stable insertion order for internal parsing and default selection, so
provider resolution and defaults are unchanged. Doc comments now spell out
which order is for stable internal matching vs UI display. DeepSeek remains
present and searchable.

Foundational public API for the UI wiring (model picker / provider picker /
completions), which intersects unmerged provider-aware search (#3075) and is
left as a follow-up so this slice stays safe and self-contained.

Tests assert display order is alphabetical, differs from ProviderKind::ALL
order, is complete and de-duplicated, and that DeepSeek is present but not
first in display order.

Refs: #3076
…e substrate

Adds crates/tui/src/worker_profile.rs: the per-role capability contract every
detached worker (agent_open sub-agent or Fleet worker) should run under —
PermissionSet (write/network), ShellPolicy (None/ReadOnly/Full, replacing the
legacy shell boolean), ToolScope (Inherit/Explicit, mirroring AgentWorkerToolProfile),
ModelRoute (Inherit/Auto/Fixed — the heterogeneous-model piece), provider override,
spawn-depth budget, and background flag.

derive_child(parent, requested) intersects capabilities so a child can NEVER escalate
beyond its parent (permissions AND-ed, shell min, explicit tools bounded by the parent
set, depth decremented + clamped to MAX_SPAWN_DEPTH_CEILING). Reuses the existing
SubAgentType role taxonomy (one taxonomy, not a parallel one). 7 tests.

Foundation only (#![allow(dead_code)]): wiring agent_open/fleet to build + enforce these
profiles, and mapping the legacy shell boolean / AgentWorkerToolProfile onto it, is the
follow-up. This is the substrate for the WhaleFlow≈ultracode worker model.

Refs: #3217, #3211, #3213, #414, #426, #1186
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add crates/tui/src/model_registry.rs: one place to look up model facts
(id, provider grouping, context_window, max_output, supports_reasoning).
Every numeric fact is SEEDED from the existing crate::models lookups
(context_window_for_model / max_output_tokens_for_model /
model_supports_reasoning) so the registry can never silently disagree
with models.rs. Canonical model ids mirror the DEFAULT_* provider
defaults in crates/config plus the explicit models.rs rows.

This is additive foundation only: existing hard-coded call sites are
left untouched and will be migrated to consume the registry in a later
pass. A drift-guard test re-asserts the registry context window equals
the live models.rs value for a per-provider sample, so a future
hard-coded literal that drifts is caught in CI. DeepSeek defaults are
seeded and classified first-class.

Wires `mod model_registry;` into crates/tui/src/main.rs.

Refs: #3071 #3073

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Wire the currently-dead per-goal token/time accounting on the durable
ThreadGoal. The protocol ThreadGoal and the state thread_goals table
already carry tokens_used/time_used_seconds, but they were set to 0 at
creation and never incremented.

Add StateStore::record_thread_goal_usage, an additive helper that
atomically increments tokens_used and time_used_seconds (via SQL
col = col + ?) and advances updated_at monotonically (MAX(updated_at,
now)), returning the updated ThreadGoalRecord or None when the thread
has no goal. It never creates a goal row. This is the durable-accounting
foundation the persistent goal loop (#3215) and the sidebar will later
read; no runtime goal loop or existing behavior is changed.

Covered by two unit tests: multi-accrual accumulation with identity-field
and monotonic-updated_at assertions, and the goalless no-op (returns None,
creates nothing).

Refs: #3215 #1976 #2029

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add an additive, pure data foundation for the future /provider readiness
dashboard. New module `provider_readiness` assembles a `ProviderReadinessRow`
per provider from the existing `config::provider_capability` +
`has_api_key_for` + model-resolution helpers: provider, active flag, has_key,
credential-derived `ProviderReadiness`, resolved model + `ModelProvenance`,
base URL hint, context_window/max_output, and the thinking/cache/streaming
capability flags.

Foundation-only: no rendering and no network I/O (live health states are
reserved variants for a later cached-health layer), and `provider_picker.rs`
is intentionally untouched. Wired via `mod provider_readiness;` in main.rs.
10 unit tests cover local/hosted/active/inactive readiness, V4 metadata,
catalog-default vs saved provenance, explicit-unknown metadata, and base-url
surfacing. DeepSeek support and CodeWhale branding preserved.

Refs: #3083

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add crates/tui/src/context_budget.rs: a pure, I/O-free ContextBudget
service that derives available input budget, the output token cap, a
compaction-trigger threshold (~75% of window), and a Low/Medium/High/
Critical PressureLevel from a model context window, current input
tokens, and a configured output cap.

Mirrors the engine's hard-won budget semantics (window-dependent output
reservation, window - reserved_output - headroom, saturating arithmetic
that never underflows on small self-hosted windows) as standalone
functions, with thorough unit tests across window sizes (8K..1M) and
every pressure/compaction boundary. PressureLevel labels stay aligned
with the existing context-report vocabulary.

Foundation-only: wiring the engine capacity checkpoints and the TUI
pressure indicator to consume this is a later pass. Additive — no
existing behavior or tests changed; DeepSeek support and CodeWhale
branding preserved.

Refs: #3086

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…layer (#3215)

Adds crates/tui/src/goal_loop.rs: the pure decision core that turns a one-shot
/goal into a persistent work loop. decide_continuation(status, progress, budget)
returns Continue or Stop(reason) — terminal model status (Completed/Blocked) wins,
then budget/circuit-breaker (token, time, and an always-on continuation cap that
prevents a runaway loop), else continue. Reads the durable per-goal accounting wired
by crates/state record_thread_goal_usage. 7 tests.

Foundation only (#![allow(dead_code)]): the engine continuation hook
(turn_loop.rs goal_continuation_message_if_needed, today capped at 3/turn and reset
each turn) consumes this in the follow-up that makes goal mode persistent across
turns + durable. This is the orchestrator in the WhaleFlow≈ultracode mapping; it
composes with the worker_profile (worker substrate) and goal-metering (durable
accounting) foundations.

Refs: #3215, #891, #1976, #2058, #2029
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Introduce an additive crates/tui/src/provider_adapter.rs defining the
per-provider contract requested by #3084: a ProviderAdapter trait exposing
a capability descriptor (sourced from config::provider_capability, not
hard-coded), an AuthModel marker (EnvVar/OAuth/BuiltInKey), and a
RequestDialect marker (OpenAiCompatible/DeepSeekNative/Anthropic).

Includes a check/assert_adapter_conformance pair validating the
context_window > 0, max_output > 0, max_output <= context_window invariants,
plus worked DeepSeek and OpenAI-compatible adapters and thorough unit tests.
Foundation only (#![allow(dead_code)]); consumers wired later. DeepSeek
remains a first-class example.

Refs: #3084

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…re (#3215)

goal_continuation_message_if_needed now calls goal_loop::decide_continuation
instead of an inline counter check. Behavior-preserving today (the per-turn
continuation cap), but this is the seam where durable cross-turn budget —
token/time from the per-goal accounting (crates/state record_thread_goal_usage)
— gets enforced as goal mode becomes a persistent work loop. Makes the goal_loop
foundation a live consumer rather than dead code. 32 goal tests pass.

Refs: #3215
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Introduce crates/tui/src/resource_telemetry.rs, a pure, I/O-free
foundation for surfacing token/time/resource usage during long tasks.

- ResourceTelemetry { tokens_used, time_used_seconds, token_budget,
  time_budget_seconds } with builder helpers.
- human_summary() renders a compact line, e.g.
  "12.3k tok · 4m12s · 41% budget": tokens abbreviated with k/M,
  time as Hh Mm Ss, budget segment omitted when unbounded.
- token/time/budget fraction + percent helpers (None when unbounded
  or budget is zero, never infinity).
- Coarse PressureLevel (Low/Medium/High) from the max bounded budget
  fraction; unbounded tasks are always Low.

No rendering, no I/O, no behavior change. Carries #![allow(dead_code)]
(consumers wired later, matching features.rs). Thorough tests cover
bounded/unbounded, zero/large values, unit-boundary rounding, and the
budget threshold edges.

Refs: #2666
…on colors

Pure, additive foundation for #3074: dark themes where the default gold
accent over a light selection background renders selected rows unreadable.

Adds crates/tui/src/theme_override.rs:
- ThemeColorOverride { accent, selection_bg, selection_fg } (all Option<Rgb>,
  default = inherit), plus selection_contrast() and is_empty() helpers.
- parse_hex_color("#RRGGBB"/"RRGGBB") -> (u8,u8,u8) with a typed
  HexColorParseError (impls Display + std::error::Error).
- relative_luminance / contrast_ratio (WCAG 2.x) and meets_min_contrast so a
  future settings layer can validate legibility.

No rendering, no settings I/O; consumers wired later (matches features.rs,
#![allow(dead_code)]). Colors are plain (u8,u8,u8) triples to avoid coupling
this foundation to ratatui; the existing palette::parse_hex_rgb_color returns
Option<ratatui::Color> and would lose the typed error, so a local parser is
used. Only the alphabetical `mod theme_override;` line is added to main.rs.

Tests (cargo test -p codewhale-tui --bins theme_override): valid/invalid hex,
black/white ~21:1 contrast, identical colors = 1:1, and a low-contrast
gold-on-light-selection pair failing AA. 10 passed.

Refs: #3074
Introduce crates/tui/src/request_tuning.rs: a pure, declarative foundation
that encodes which providers honor which request-tuning params (reasoning
effort + max output tokens), so the silent no-ops in #3024 can be surfaced
and fixed deliberately later.

- RequestTuning { reasoning_effort: Option<ReasoningEffort>, max_output_tokens:
  Option<u32> } reuses the canonical crate::tui::app::ReasoningEffort enum
  (already imported by auto_reasoning / model_routing) instead of a local copy.
- provider_tuning_support(name) -> TuningSupport { honors_reasoning_effort,
  honors_max_output_tokens } with documented rows grounded in current
  client.rs / client/chat.rs behavior: DeepSeek honors both; OpenAI / Moonshot
  / Ollama / Atlascloud have gaps; unknown names fall back to a conservative
  default.

Additive only: one new module behind #![allow(dead_code)] (consumers wired
later, matching features.rs) plus its alphabetical mod line. No request
building, no behavior change. Tests assert the support map for each named
provider, the DeepSeek-CN aliases, and the default.

Refs: #3024

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several core features and optimizations to CodeWhale, including a persistent goal loop orchestrator, a unified context-budget math module, and a nonblocking sub-agent fanout mechanism to prevent TUI freezes. It also adds a branch hygiene script for post-merge cleanup and expands pricing support for non-DeepSeek models. The code review highlights a few important issues: a portability bug on macOS in the branch hygiene script, performance overhead from string allocations during provider sorting, the use of unstable Rust let_chains in multiple files, and debug print noise in the main entry point.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +190 to +191
release_branch="$(git for-each-ref --format='%(refname:short)' 'refs/heads/codex/v*' \
| sort -V | tail -n1 || true)"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The sort -V (version sort) option is a GNU extension and is not supported by the default sort utility on macOS (BSD sort), which will cause this script to fail on macOS. Since Git has built-in version sorting support via --sort=version:refname, we can use that directly to ensure portability across all platforms.

Suggested change
release_branch="$(git for-each-ref --format='%(refname:short)' 'refs/heads/codex/v*' \
| sort -V | tail -n1 || true)"
release_branch="$(git for-each-ref --sort=version:refname --format='%(refname:short)' 'refs/heads/codex/v*' \
| tail -n1 || true)"

Comment on lines +553 to +561
pub fn providers_sorted_for_display() -> Vec<&'static dyn Provider> {
let mut providers = all_providers().to_vec();
providers.sort_by(|a, b| {
a.display_name()
.to_ascii_lowercase()
.cmp(&b.display_name().to_ascii_lowercase())
});
providers
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Calling to_ascii_lowercase() on &str allocates a new String on every comparison. Since sort_by performs $O(N \log N)$ comparisons, this results in hundreds of unnecessary allocations. You can perform a zero-allocation case-insensitive comparison by comparing the byte iterators directly.

Suggested change
pub fn providers_sorted_for_display() -> Vec<&'static dyn Provider> {
let mut providers = all_providers().to_vec();
providers.sort_by(|a, b| {
a.display_name()
.to_ascii_lowercase()
.cmp(&b.display_name().to_ascii_lowercase())
});
providers
}
pub fn providers_sorted_for_display() -> Vec<&'static dyn Provider> {
let mut providers = all_providers().to_vec();
providers.sort_by(|a, b| {
a.display_name()
.bytes()
.map(|b| b.to_ascii_lowercase())
.cmp(b.display_name().bytes().map(|b| b.to_ascii_lowercase()))
});
providers
}

Comment on lines +119 to +128
if let Some(tokens) = budget.token_budget
&& progress.tokens_used >= tokens
{
return ContinuationDecision::Stop(StopReason::TokenBudget);
}
if let Some(secs) = budget.time_budget_seconds
&& progress.time_used_seconds >= secs
{
return ContinuationDecision::Stop(StopReason::TimeBudget);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The let_chains feature is currently unstable in Rust and requires a nightly compiler. To ensure compatibility with stable Rust, use Option::is_some_and instead, which is stable since Rust 1.70.0.

    if budget.token_budget.is_some_and(|tokens| progress.tokens_used >= tokens) {
        return ContinuationDecision::Stop(StopReason::TokenBudget);
    }
    if budget.time_budget_seconds.is_some_and(|secs| progress.time_used_seconds >= secs) {
        return ContinuationDecision::Stop(StopReason::TimeBudget);
    }

Comment on lines +2530 to +2534
if let Some(limit) = query.replay_limit
&& backlog.len() > limit
{
backlog = backlog.split_off(backlog.len() - limit);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The let_chains feature is unstable in Rust. To ensure compatibility with stable Rust, use a nested if let block instead.

Suggested change
if let Some(limit) = query.replay_limit
&& backlog.len() > limit
{
backlog = backlog.split_off(backlog.len() - limit);
}
if let Some(limit) = query.replay_limit {
if backlog.len() > limit {
backlog = backlog.split_off(backlog.len() - limit);
}
}

Comment thread crates/tui/src/main.rs
Comment on lines +1136 to +1144
if let Ok(env_url) = std::env::var("DEEPSEEK_BASE_URL") {
let trimmed = env_url.trim();
eprintln!("DEBUG DEEPSEEK_BASE_URL='{trimmed}'");
if !trimmed.is_empty() {
config.base_url = Some(trimmed.to_string());
}
} else {
eprintln!("DEBUG DEEPSEEK_BASE_URL not set");
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These eprintln! calls print debug noise to stderr in production. Additionally, the check should also honor CODEWHALE_BASE_URL as the primary environment variable, falling back to DEEPSEEK_BASE_URL for backward compatibility.

                let env_url = std::env::var("CODEWHALE_BASE_URL")
                    .or_else(|_| std::env::var("DEEPSEEK_BASE_URL"));
                if let Ok(env_url) = env_url {
                    let trimmed = env_url.trim();
                    if !trimmed.is_empty() {
                        config.base_url = Some(trimmed.to_string());
                    }
                }

Hmbown and others added 28 commits June 14, 2026 12:09
…acode

5-agent workflow examined kimi-code's swarm, the broader swarm pattern, and CodeWhale's
WhaleFlow today (code + docs). Verdict: vision honors both targets (more ambitious than
kimi — heterogeneous-model workers vs a single trained orchestrator) and the Train 3->4
plan is the right sequence, but implementation is largely foundation-only dead_code /
prompt-only. 9 recommendations, each mapped to a v0.8.61 issue/train; the net-new gap is
a swarm coordination substrate that un-orphans the crates/whaleflow IR via the Fleet ledger.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Apply the effective MCP read timeout inside the JSON-RPC receive loop and disconnect streams that emit non-JSON prompt text, so YOLO/non-interactive runs fail clearly instead of hanging on MCP startup prompts.

Refs: #2475
Refs: #3203

Refs: #3224

Refs: #2054

Refs: #2982

Refs: #963

Refs: #3028

Refs: #3078

Refs: #3190

Refs: #2666

Refs: #3194
Use scoped env guards for secret-backend overrides and pin the save-key tests to a per-test config path so parallel route-filtered runs cannot redirect writes through process-global env state.

Refs: #3211
Summarize Train 2 implementation, tests, residual risks, and final verification for the isolated v0.8.61 worktree.

Refs: #3211
Resolve sub-agent assignment routes through WorkerRuntimeProfile::model, preserving explicit model overrides while giving scout/tool roles a provider-aware cheap lane and no-thinking request tuning. Synthesis roles continue to inherit the session route, and providers without a cheap tier stay on the parent model.

Refs: #2027

Refs: #1768
Refs: #3205

Refs: #3204

Refs: #3213

Refs: #3072

Refs: #3073

Refs: #3075

Refs: #3025

Refs: #2027

Refs: #1768
# Conflicts:
#	crates/tui/src/tools/subagent/mod.rs
# Conflicts:
#	crates/tui/src/core/engine.rs
#	crates/tui/src/prompts.rs
# Conflicts:
#	crates/tui/src/tui/app.rs
#	crates/tui/src/tui/subagent_routing.rs
#	crates/tui/src/tui/ui/tests.rs
# Conflicts:
#	crates/tui/src/tools/subagent/mod.rs
Six lints (collapsible-if x2 via let-chains, redundant-closure, derivable Default,
manual is_multiple_of, needless deref) + dead-code allow on the legacy with_agent_tools
wrapper (prod path uses with_agent_tools_policy). Fixed by dogfooding `codewhale exec --auto`
on the freshly-built 0.8.61 binary; clippy --workspace --all-features -D warnings is clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- state: parity test expects schema v4 (Train 4 added thread_goals.continuation_count migration)
- engine: Plan mode = ShellPolicy::None (no shell), consistent with the runtime prompt's
  shell_access=none — reverts an incidental Train-2 Plan->ReadOnly mapping that leaked shell
  tools into read-only planning; sandbox stays ReadOnly
- shell: update exec_shell schema/move-to-background assertions to Train 2's reworded guidance
  (intent preserved: >5s -> background, references exec_shell_wait)
- subagent: unify the no-record status-projection path onto worker_status_from_subagent_result
  so interrupted+continuable projects waiting_for_user (needs-parent-action) — the no-record and
  worker-record paths previously disagreed (the real cross-wiring)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bump all crates 0.8.60 -> 0.8.61, npm package, and web facts; fill the 0.8.61 changelog with
the integrated runtime-control-plane work (TUI freeze fix, provider/model route isolation,
fleet-worker convergence, durable goal mode, distribution hygiene) plus community contributions.
Not tagged or published — release artifacts await maintainer approval.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the verbose constitution head (preamble + 7 articles + {model_id} ceremony) with the
maintainer's v4: a preamble + six articles (Ground Truth, Verification, Momentum, Legacy, Help,
Priority), model-agnostic. constitution.md head is byte-identical to v4; the operational
STATUTES/REGULATIONS/EVIDENCE tiers below are preserved (runtime-required). yaml +
render_constitution.py + ~15 prompt tests updated to match. Full bin suite green (4872).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Richer README front door (open-models-first framing, '## The Idea' constitution essay,
accurate 24-provider list) + website improvements (SEO opengraph/robots/sitemap routes,
nav/footer, facts-driven pages, coupled locale-layout), 3-way reconciled to 0.8.61:
provider count corrected 21->24 (verified vs crates/config), version/palette/dead-link fixes.
Conservatively deferred 3 heavily-conflicting web pages (page/faq/install) to preserve main's
richer versions. Verified: tsc --noEmit exit 0, eslint clean, cargo build clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Hmbown Hmbown merged commit a70d5c9 into main Jun 14, 2026
@greptile-apps

greptile-apps Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Too many files changed for review. (164 files found, 100 file limit)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rename DEEPSEEK_BLUE to WHALE_ACCENT_PRIMARY in consumer code Cost tracking is dead for all non-DeepSeek models — pricing table needs expansion

7 participants