← Back to root README · Codex deep reference
Configuration for Claude Code (Anthropic's AI coding CLI). This file covers agent relationships, skill orchestration flows, implementation architecture, and operational internals. For the high-level overview and workflow sequences, see the root README.
Contents
.claude/ is entirely restored from the installed plugins — there is nothing to manually copy or edit. After a fresh clone or machine setup:
Step 1 — install the plugins (run from the directory containing your clone):
claude plugin marketplace add ./Borda-AI-Home
claude plugin install foundry@borda-ai-home
claude plugin install oss@borda-ai-home
claude plugin install develop@borda-ai-home
claude plugin install research@borda-ai-homeStep 2 — run inside Claude Code:
/foundry:init
This merges statusLine, permissions.allow, and enabledPlugins (codex plugin) into ~/.claude/settings.json; symlinks rules/*.md and TEAM_PROTOCOL.md into ~/.claude/. Agents, skills, and hooks are exposed natively by the Claude Code plugin system — no symlinks needed.
What is restored: ~/.claude/rules/*.md and ~/.claude/TEAM_PROTOCOL.md become symlinks into the installed foundry plugin. ~/.claude/settings.json is updated in-place. All other plugin files (agents, skills, hooks, CLAUDE.md) are served directly by the plugin system. The only local-machine files are settings.local.json and settings.json (project prefs + permissions).
Re-run /foundry:init after any plugin upgrade — rule symlinks point to versioned cache paths and go stale after reinstall.
plugins/foundry/ is the source of truth for all foundry configuration. The Claude Code plugin system natively exposes agents and skills; /foundry:init symlinks rules and TEAM_PROTOCOL.md into ~/.claude/ so they load on every session.
plugins/foundry/ ← source of truth
rules/*.md ←── symlinked ──→ ~/.claude/rules/*.md (init: ln -sf)
TEAM_PROTOCOL.md ←── symlinked ──→ ~/.claude/TEAM_PROTOCOL.md (init: ln -sf)
agents/*.md ← plugin system exposes as foundry:<agent>
skills/*/SKILL.md ← plugin system exposes as foundry:<skill> and /<skill>
hooks/*.js ← auto-registered via hooks.json (no init action)
CLAUDE.md ← loaded by plugin system per session (no init action)
permissions-guide.md ← in plugin cache; not distributed elsewhere
Distributing to ~/.claude/ — run after install or upgrade:
/foundry:init # symlink rules/*.md + TEAM_PROTOCOL.md → ~/.claude/;
# merge statusLine, permissions.allow, enabledPlugins → ~/.claude/settings.json
# (re-run after plugin upgrade to refresh stale rule symlinks)
What is NOT distributed: settings.local.json (machine-local overrides — API keys, MCP server activation, local permissions).
statusLine path: home settings.json uses $HOME prefix (node $HOME/.claude/hooks/statusline.js) — /foundry:init sets this automatically.
Two optional MCP servers are defined in .mcp.json at the repo root. Both are disabled by default and must be enabled per-machine. Copy to home manually: cp .mcp.json ~/.claude/.mcp.json.
Connects HKUDS/OpenSpace as a local MCP server. Exposes four tools — execute_task, search_skills, fix_skill, upload_skill — that let Claude delegate tasks to OpenSpace's skill-evolving runtime. Skills auto-improve through use; the benchmark reports ~46% fewer tokens on warm reruns.
New-machine setup:
# 1. Install OpenSpace globally via pipx (Python ≥ 3.12 required)
brew install pipx
pipx install https://github.com/HKUDS/OpenSpace/archive/refs/heads/main.zip --python python3.12
~/.local/bin/openspace-mcp --help # smoke test
# 2. Update the command path in .mcp.json if your username differs:
# "$HOME/.local/bin/openspace-mcp"
# 3. Make the server available globally (user-level config):
cp .mcp.json ~/.claude/.mcp.json
# The project-level .mcp.json already loads when Claude Code runs inside this repo.
# 4. Enable for the current session
# Add "openspace" to enabledMcpjsonServers in .claude/settings.local.json:
# "enabledMcpjsonServers": ["openspace"]
# 5. Restart Claude Code — openspace tools appear in the MCP tool listRuntime data lives at ~/.claude/openspace/ (SQLite lineage DB + execution recordings). Not version-controlled, not synced. Persists across sessions.
Conflict policy: existing hand-crafted SKILL.md files in ~/.claude/skills/ are protected with chmod 444 after setup — OpenSpace cannot overwrite them. Remove the protection with chmod 644 ~/.claude/skills/<name>/SKILL.md only when you intend to let OpenSpace evolve that skill.
Disable: remove "openspace" from enabledMcpjsonServers in settings.local.json.
Used by /research:run --colab for GPU workloads via Google Colab. See the /research:run skill examples for usage. Enable by adding "colab-mcp" to enabledMcpjsonServers.
| Agent | Purpose | Key Capabilities |
|---|---|---|
| sw-engineer | Architecture and implementation | SOLID principles, type safety, clean architecture, doctest-driven dev |
| solution-architect | System design and API planning | ADRs, interface specs, migration plans, coupling analysis, API surface audit |
| shepherd | Project lifecycle management | Issue triage, PR review, SemVer, pyDeprecate, trusted publishing |
| scientist | ML research and implementation | Paper analysis, experiment design, LLM evaluation, inference optimization |
| qa-specialist | Testing and validation | pytest, hypothesis, mutation testing, snapshot tests, ML test patterns; auto-includes OWASP Top 10 in teams |
| linting-expert | Code quality and static analysis | ruff, mypy, pre-commit, rule selection strategy, CI quality gates; runs autonomously (permissionMode: dontAsk) |
| perf-optimizer | Performance engineering | Profile-first workflow, CPU/GPU/memory/I/O, torch.compile, mixed precision |
| ci-guardian | CI/CD reliability | GitHub Actions, reusable workflows, trusted publishing, flaky test detection |
| data-steward | Data lifecycle — acquisition and ML pipelines | API completeness, dataset versioning, split validation, leakage detection, data contracts |
| doc-scribe | Documentation | Google/Napoleon docstrings (no type duplication), Sphinx/mkdocs, API references |
| web-explorer | Web and docs research | API version comparison, migration guides, PyPI tracking, ecosystem compat |
| self-mentor | Config quality reviewer | Agent/skill auditing, duplication detection, cross-ref validation, line budgets |
Agents are picked in two ways: by name (you write "use the qa-specialist to…") or automatically when Claude Code spawns subagents via the Task/Agent tool. The selection heuristic matches the task description against each agent's description: frontmatter — /calibrate routing benchmarks this accuracy.
Key relationships:
linting-expertis always downstream ofsw-engineer— never lints code that hasn't been implemented yetqa-specialistis often parallel tosw-engineer(reviews) or downstream (validates implementation)doc-scribeis always downstream — documents finalized code; never shapes designself-mentoris orthogonal — audits config files, not user code; spawned by/auditand/brainstormweb-explorerfeedsscientist— fetches current docs/papers; scientist interprets and designs experimentsshepherdis the external interface — PR replies, releases, contributor communication; no code implementation
Model tiering: reasoning agents (sw-engineer, qa-specialist, perf-optimizer, scientist) default to opus; plan-gated agents (solution-architect, shepherd, self-mentor) use opusplan (plan-gated Opus — pays for reasoning only when the task warrants it); execution agents (doc-scribe, linting-expert, ci-guardian, data-steward, web-explorer) default to sonnet.
| Skill | Plugin | Command | What It Does |
|---|---|---|---|
| audit | foundry | /audit [scope] fix [high|medium|all] | upgrade |
Config audit: broken refs, inventory drift, docs freshness; fix auto-fixes at the requested severity level; upgrade applies docs-sourced improvements (mutually exclusive with fix) |
| manage | foundry | /manage <op> <type> |
Create, update, delete agents/skills/rules; manage settings.json permissions (add perm/remove perm); auto type-detection and cross-ref propagation |
| calibrate | foundry | /calibrate [target] [fast|full] [apply] |
Synthetic benchmarks measuring recall vs confidence bias; routing and communication modes available |
| brainstorm | foundry | /brainstorm <idea> | breakdown <tree-or-spec> |
Two modes: (1) idea — clarifying questions → build divergent branch tree (deepen, close, merge, up to 10 ops) → save tree doc → self-mentor review → gate; (2) breakdown — auto-detects input: tree (Status: tree) → distillation questions → section-by-section spec; spec (Status: draft) → ordered action plan |
| investigate | foundry | /investigate <symptom> |
Systematic diagnosis for unknown failures — env, tools, hooks, CI divergence; ranks hypotheses and hands off to the right skill |
| session | foundry | /session [resume|archive|summary] |
Parking lot for diverging ideas — auto-parks unanswered questions and deferred threads; resume shows pending, archive closes, summary digests the session |
| distill | foundry | /distill |
One-time snapshot: suggest new agents/skills, review roster, prune memory, or consolidate lessons |
| oss:review | oss | /oss:review [file|PR#] [--reply] |
Parallel review across arch, tests, perf, docs, lint, security, API; --reply drafts contributor comment |
| oss:analyse | oss | /oss:analyse <N|health|ecosystem|path/to/report.md> [--reply] |
GitHub thread analysis (auto-detects issue/PR/discussion); health = repo overview + duplicate clustering |
| oss:resolve | oss | /oss:resolve <PR#|URL> [report] | report | <comment> |
OSS fast-close: conflicts + review comments via Codex; three source modes: pr (live GitHub), report (/oss:review findings), pr + report (aggregated + deduplicated in one pass) |
| oss:release | oss | /oss:release <mode> [range] |
Notes, changelog, migration, full prepare pipeline, or readiness audit |
| develop:feature | develop | /develop:feature <goal> |
TDD-first feature dev: codebase analysis, demo test, TDD loop, docs, review |
| develop:fix | develop | /develop:fix <goal> |
Reproduce-first bug fixing: regression test, minimal fix, quality stack |
| develop:refactor | develop | /develop:refactor <goal> |
Test-first refactor with coverage audit before changing structure |
| develop:plan | develop | /develop:plan <goal> |
Scope analysis — produces structured plan without writing implementation code |
| develop:debug | develop | /develop:debug <goal> |
Investigation-first debugging: evidence gathering → hypothesis gate → minimal fix |
| develop:review | develop | /develop:review |
Six-agent parallel review of local files or current git diff; no GitHub PR needed |
| research:topic | research | /research:topic <topic> |
SOTA literature research with codebase-mapped implementation plan |
| research:plan | research | /research:plan <goal|file.py> |
Config wizard: interactive goal → program.md; plan <file.py> for profile-first bottleneck discovery |
| research:judge | research | /research:judge [file] |
Research-supervisor review of experimental methodology (hypothesis, measurement, controls, scope, strategy fit → APPROVED/NEEDS-REVISION/BLOCKED) |
| research:run | research | /research:run <goal|file> [--resume] [--team] [--colab] |
Metric-driven iteration loop; --resume continues after crash; --team for parallel exploration; --colab for GPU workloads |
| research:sweep | research | /research:sweep <goal|file> |
Non-interactive pipeline: auto-plan → judge gate → run |
Each skill follows a defined topology for how it composes agents:
`/oss:review` — parallel fan-out, then consolidation
Tier 0: git diff --stat (mechanical gate — skips trivial diffs)
Tier 1: Codex pre-pass (independent diff review, ~60s)
Tier 2: 6 parallel agents — sw-engineer, qa-specialist, perf-optimizer,
doc-scribe, solution-architect, linting-expert
→ consolidator reads all findings → final report
→ shepherd writes --reply output (if flag present)
`/develop:feature` — sequential with inner loops
Step 1: sw-engineer (codebase analysis)
Step 2: sw-engineer (demo test — TDD contract)
Step 2 review: in-context validation gate
Step 3: sw-engineer (implementation) + qa-specialist (parallel)
Step 4: review+fix loop (max 3 cycles): sw-engineer → qa-specialist → linting-expert
Step 5: doc-scribe (docs update)
Quality stack: linting-expert → qa-specialist → Codex pre-pass
`/develop:fix` — reproduce-first
Step 1: sw-engineer (root cause analysis)
Step 2: sw-engineer (regression test that fails)
Step 2 review: in-context validation gate
Step 3: sw-engineer (minimal fix)
Step 4: review+fix loop (max 3 cycles)
Quality stack: linting-expert → qa-specialist → Codex pre-pass
`/develop:refactor` — test-first
Step 1: sw-engineer + linting-expert (coverage audit, parallel)
Step 2: qa-specialist (characterization tests)
Step 2 review: in-context validation gate
Step 3: sw-engineer (refactor)
Step 5: review+fix loop (max 3 cycles)
Quality stack: linting-expert → qa-specialist → Codex pre-pass
`/research:topic` — research-first
web-explorer (fetch current papers/docs) → scientist (deep analysis, writes to file)
→ consolidator reads findings → implementation plan
(--team: multiple scientist instances on competing method families)
`/brainstorm` — conversational spec, then task breakdown
idea mode:
Step 1: context scan (Read README, Grep keywords)
Step 2: AskUserQuestion (clarify, one at a time, max 10)
Step 3: build tree loop (seed 3–5 branches → deepen/close/merge/add, max 10 ops)
Step 4: Write tree doc → .plans/blueprint/YYYY-MM-DD-<slug>.md (Status: tree)
Step 5: self-mentor (tree quality audit — coverage, closure quality, open threads)
Step 6: AskUserQuestion (approval gate) → suggest /brainstorm breakdown <tree>
breakdown mode (triggered by "breakdown <tree-or-spec>"):
Auto-detects Status field:
Status: tree → D1 present summary → D2 distillation questions (max 5)
→ D3 write spec section-by-section → D4 suggest next step
Status: draft → B1 blocking questions → B2 action plan table → B3 post-plan prompt
`/audit` — self-mentor per file, then consolidation
per-config-file: self-mentor (reads file, writes findings to /tmp/audit-<ts>/<file>.md)
→ consolidator reads all finding files → ranked report with upgrade proposals
(upgrade mode: web-explorer fetches latest Claude Code docs first)
/research:plan, /research:run, /research:judge, /research:sweep — Profile-first bottleneck discovery and metric-improvement loop
# plan mode — interactive config wizard → program.md
/research:plan "increase test coverage to 90%"
/research:plan src/mypackage/train.py # profile-first: cProfile → ask what to optimize → wizard
/research:plan "improve F1 from 0.82 to 0.87" coverage.md # write to custom path
# judge mode — pre-flight quality gate before the expensive run loop
/research:judge # review program.md methodology → APPROVED / NEEDS-REVISION / BLOCKED
/research:judge coverage.md # audit a specific program file
/research:judge --skip-validation # skip local metric/guard validation (cross-machine workflows)
# run mode — sustained metric-improvement loop
/research:run "increase test coverage to 90%" # run from text goal (20-iteration loop; auto-rollback on regression)
/research:run coverage.md # run from program.md config file
# resume mode — continue after crash or manual stop
/research:run --resume # reads program_file from state.json
/research:run coverage.md --resume # resume specific run
# sweep mode — non-interactive pipeline: auto-plan → judge gate → run
/research:sweep "increase test coverage to 90%" # automated end-to-end; no user gates
/research:sweep coverage.md # sweep from program.md config
# flags (run/sweep)
/research:run "reduce training time by 20%" --team # parallel exploration across axes
/research:run "improve validation accuracy" --colab # GPU workloads via Colab MCP (opt-in)
Colab MCP is opt-in.
.mcp.jsondefines the server but does not start it. To enable: add"colab-mcp"toenabledMcpjsonServersin.claude/settings.local.json, then restart Claude Code.
/oss:review — Parallel PR review; /develop:review — local file/diff review
# PR review (GitHub)
/oss:review 42 # review PR by number
/oss:review 42 --reply # review + draft contributor-facing comment
# Local diff or file review (no GitHub PR needed)
/develop:review src/mypackage/transforms.py
/develop:review # review current git diff
/oss:analyse — Issue, PR, Discussion and repo health
/oss:analyse 123 # auto-detects issue/PR/discussion; wide-net related search
/oss:analyse health # repo health overview with duplicate clustering
/oss:analyse ecosystem # downstream consumer impact analysis
/oss:analyse 123 --reply # analyse + draft contributor reply
/oss:release — Release notes, changelog, readiness checks
/oss:release notes v1.2.0..HEAD
/oss:release changelog v1.2.0..HEAD
/oss:release prepare v2.0.0
/oss:release audit
/manage — Agent/skill lifecycle
/manage create agent security-auditor "Security specialist for vulnerability scanning"
/manage update skill optimize perf-audit
/manage delete agent web-explorer
/audit — Config health sweep + upgrade
/audit # full sweep — report only, includes upgrade proposals table
/audit fix # auto-fix critical and high findings
/audit upgrade # apply docs-sourced improvements
/audit agents # agents only, report only
/audit skills fix # skills only, with auto-fix
/develop:feature, /develop:fix, /develop:refactor, /develop:plan, /develop:debug — Development workflows
Each mode enforces a validation gate before writing implementation code:
/develop:plan— scope analysis; produces structured plan in.plans/active/plan_<slug>.md/develop:feature— TDD demo validation before writing code/develop:fix— reproduction test before touching anything/develop:refactor— coverage audit before changing structure/develop:debug— investigation-first; evidence gathering → hypothesis gate → minimal fix
/develop:feature add batched predict() method to Classifier
/develop:fix TypeError when passing None to transform()
/develop:refactor src/mypackage/transforms.py
/develop:plan improve caching in the data loader
/develop:debug why does the validation loss spike at epoch 3?
/oss:resolve — Resolve a PR end-to-end
/oss:resolve 42 # pr mode: live GitHub comments → conflict check → semantic resolution → action items
/oss:resolve https://github.com/org/repo/pull/42 # same as above, URL form
/oss:resolve report # report mode: latest /oss:review findings as action items; no GitHub re-fetch
/oss:resolve 42 report # pr + report mode: GitHub comments + /oss:review findings, aggregated and deduplicated
/oss:resolve "rename foo to bar throughout the auth module" # single-comment fast path (comment dispatch mode)
/investigate — Systematic failure diagnosis
/investigate "hooks not firing on Save"
/investigate "codex exec exits 127 on this machine"
/investigate "CI fails but passes locally"
/investigate "/calibrate times out every run"
/investigate "uv run pytest can't find conftest.py"
/session — Session parking lot
/session # auto-parks current diverging ideas and open questions
/session resume # show all pending parked items
/session archive # close all pending items
/session summary # digest of what happened this session
Legend
- ✓ — actively spawned by this skill
- ° — scope boundary: "use this agent for X" in the description, never spawns it directly
- → — delegates a subtask at runtime (a real call, not just a boundary mention)
- ? — conditional: which agent is selected depends on runtime strategy
- — — no dependency
Agent short names
foundry 🔨
🔨sm— self-mentor🔨sw— sw-engineer🔨qa— qa-specialist🔨lint— linting-expert🔨arch— solution-architect🔨perf— perf-optimizer🔨doc— doc-scribe🔨web— web-explorer
oss 🌐
🌐cig— ci-guardian🌐shep— shepherd
research 🔬
🔬sci— scientist🔬ds— data-steward
ext 🤖
🤖cx— codex-rescue
Leaf agents — no outgoing calls: 🔨sw, 🔨qa, 🔨lint, 🔨perf, 🔨arch, 🔨web, 🔨sm
| Caller ↓ / Called → | 🔨sw | 🔨qa | 🔨lint | 🔨doc | 🔨perf | 🔨web | 🌐cig | 🌐shep | 🔬sci | 🔬ds |
|---|---|---|---|---|---|---|---|---|---|---|
| ci-guardian | — | — | ° | — | — | — | — | ° | — | — |
| shepherd | — | — | — | ° | — | — | ° | — | — | — |
| doc-scribe | ° | — | ° | — | — | — | — | ° | — | — |
| scientist | ° | ° | — | — | ° | ° | — | — | — | ° |
| data-steward | — | — | — | — | — | → | — | — | ° | — |
Skills with no direct agent calls: init, manage, distill, session (foundry); plan, debug→fix (develop); plan, judge, sweep→run (research)
| Skill | plugin | 🔨sm | 🔨sw | 🔨qa | 🔨lint | 🔨arch | 🔨perf | 🔨doc | 🔨web | 🌐shep | 🔬sci | 🔬ds | 🤖cx |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| brainstorm | foundry | ✓ | — | — | — | — | — | — | — | — | — | — | — |
| investigate | foundry | — | — | — | — | — | — | — | — | — | — | — | ✓ |
| audit | foundry | ✓ | — | — | — | — | — | — | ✓ | — | — | — | — |
| calibrate | foundry | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | — |
| review | oss | — | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | — | ✓ | — | — | ✓ |
| analyse | oss | — | — | — | — | — | — | — | — | ✓ | — | — | — |
| release | oss | — | — | — | — | — | — | — | ✓ | — | — | — | — |
| resolve | oss | — | ✓ | ✓ | ✓ | — | — | — | — | — | — | — | ✓ |
| review | develop | — | ✓ | ✓ | ✓ | — | — | — | — | — | — | — | ✓ |
| feature | develop | — | ✓ | ✓ | ✓ | — | — | ✓ | — | — | — | — | ✓ |
| fix | develop | — | ✓ | ✓ | ✓ | — | — | — | — | — | — | — | ✓ |
| refactor | develop | — | ✓ | ✓ | ✓ | — | — | — | — | — | — | — | ✓ |
| topic | research | — | — | — | — | — | — | — | ✓ | — | ✓ | — | — |
| run | research | — | ✓ | — | — | ? | ✓ | — | — | — | ✓ | ✓ | ✓ |
| Rule file | Applies to | What it governs |
|---|---|---|
artifact-lifecycle.md |
(global) | Canonical dot-prefixed artifact layout, run-dir naming, TTL policy |
claude-config.md |
(global) | Universal ops rules: no hardcoded paths, Bash timeouts, two-separate-calls navigation pattern |
communication.md |
(global) | Re: anchor format, progress narration, tone, output routing, and terminal color conventions |
external-data.md |
(global) | Pagination and completeness rules for REST, GraphQL, and the gh CLI — never work on partial result sets |
foundry-config.md |
.claude/** |
Plan mode gate for .claude/ edits, post-edit checklist, XML tag conventions, cleanup hook, settings.json allow entries |
git-commit.md |
(global) | Commit message format, push safety (explicit confirmation required), branch safety |
python-code.md |
**/*.py |
Python style: docstrings, deprecation (pyDeprecate), library API freshness checks, version policy, PyTorch AMP |
quality-gates.md |
(global) | Confidence blocks on all analysis tasks, internal quality loop, output routing rules |
testing.md |
tests/**/*.py, **/test_*.py |
pytest AAA structure, parametrize standards, doctest location (source files, not tests) |
Each rule file has paths: frontmatter listing glob patterns. Claude Code loads matching rule files automatically when you open or edit a file that matches — no explicit invocation needed. Global rules (no paths: restriction, or paths: "*") load in every session. Rules are additive: multiple rules can apply to the same file.
Example: editing tests/test_transforms.py auto-loads testing.md (matches tests/**/*.py) and python-code.md (matches **/*.py). Editing .claude/agents/sw-engineer.md loads foundry-config.md (matches .claude/**).
When multiple analysis agents return findings inline, the orchestrator's context window fills with intermediate output it never uses directly — file-based handoff keeps the orchestrator clean for decision-making.
When it applies:
- Any skill spawning 2+ agents in parallel for analysis or review
- Any single agent expected to produce >500 tokens of findings
- Exception: implementation agents (writing code) return inline — their output is the deliverable
- Exception: single-agent single-question spawns where output is inherently short (<200 tokens)
Agent contract — the spawned agent must:
- Write full output to
<RUN_DIR>/<agent-name>.mdusing the Write tool - Return to the orchestrator only a compact JSON envelope on the final line:
{
"status": "done",
"findings": 3,
"severity": {
"critical": 0,
"high": 1,
"medium": 2
},
"file": "<path>",
"confidence": 0.88,
"summary": "1 high (missing tool), 2 medium (unused tools)"
}Orchestrator contract:
- Do NOT read agent files back into main context — delegate to a consolidator agent instead
- Collect the compact envelopes (tiny — stay in context)
- Spawn a consolidator to read all
<RUN_DIR>/*.mdfiles and write the final report
Threshold: 4+ agent files → mandatory consolidator; 2–3 files → orchestrator may read directly if total content <2K tokens.
RUN_DIR convention:
- Ephemeral (per-run):
/tmp/<skill>-<timestamp>/— created once before any spawns - Persistent (final reports):
.temp/
Reference implementations: /calibrate is canonical; /audit Step 3 (self-mentor per file → consolidator); /oss:review Steps 3–6.
Every review skill gates cheap work before spawning expensive agents — cheaper tiers short-circuit the pipeline when the diff is trivial or issues are already clear:
| Tier | What it does | Cost |
|---|---|---|
| T0 — Mechanical gate | git diff --stat — skips trivial or empty diffs before any AI work |
Zero |
| T1 — Codex pre-pass | Focused diff review (~60 s); flags bugs, edge cases, and logic errors | Low |
| T2 — Claude agents | Specialized parallel agents (opus for reasoning, sonnet for execution) | High |
Which tiers each skill uses:
| Skill | T0 | T1 | T2 |
|---|---|---|---|
/develop:feature, /develop:fix, /develop:refactor |
✓ | ✓ | ✓ |
/oss:review |
✓ | ✓ ‡ | ✓ |
/research:run |
✓ | ✓ | ✓ |
/audit fix |
✓ | ✓ | ✓ |
/oss:resolve |
— | — | ✓ |
‡ For /oss:review, Codex runs as a full co-reviewer alongside T2 agents — its findings are independently consolidated rather than seeding agent prompts (unbiased review).
Agent Teams is Claude Code's experimental multi-agent feature. Teams are always user-invoked — nothing auto-spawns. Auto-spawning teams would multiply token costs 5-10x on routine tasks; explicit invocation lets you make the cost/benefit call per run. Enabled via CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 in settings.json.
When to use teams vs subagents:
| Signal | Team | Subagents |
|---|---|---|
| Competing root-cause hypotheses | ✓ | |
| Cross-layer feature: impl + QA + docs in parallel | ✓ | |
| SOTA research: multiple competing method clusters | ✓ | |
| Adversarial review (teammates challenge each other) | ✓ | |
| Sequential pipeline (fix → test → lint) | ✓ | |
| Independent parallel review dimensions | ✓ | |
| Single file / single module scope | ✓ | |
| Routine tasks (sync, distill, release) | ✓ |
Skills with team support:
| Skill | When to use |
|---|---|
/develop:fix --team |
Bug spans modules; competing root-cause hypotheses |
/develop:feature --team |
Cross-layer feature needing impl + QA + docs in parallel |
/research:topic --team |
Multiple competing method families to evaluate |
/research:run --team |
Goal spans multiple optimization axes (speed = arch + pipeline + compute) |
/research:plan --team |
Wizard + parallel exploration: teammates each own a different axis |
/develop:refactor |
Directory or system-wide scope → Claude proposes team (heuristic) |
Model tiering: Lead uses opusplan/opus. Deep reasoning teammates (sw-engineer, qa-specialist, scientist, perf-optimizer) use opus. Execution teammates (doc-scribe, linting-expert, ci-guardian) use sonnet. Keep teams to 3–5 teammates (~7× token cost vs single session).
Communication protocol: Inter-agent messages use AgentSpeak v2 (defined in TEAM_PROTOCOL.md) — ~60% token savings vs natural language. Status codes (alpha/beta/gamma/delta/epsilon/omega), action symbols (+/-/~/!), file locking (+lock/-lock), and priority prefixes (!! urgent, .. FYI). Lead-to-human communication uses normal English.
Security in teams: No standalone security agent. qa-specialist automatically embeds OWASP Top 10 security checks when the task touches auth, payment flows, or user data.
Quality hooks: hooks/teammate-quality.js handles TeammateIdle (redirects to pending tasks) and TaskCompleted (reserved for future quality gates).
| Hook | Event | Matcher | Purpose |
|---|---|---|---|
| task-log.js | 9 events | all | Session state tracking |
| lint-on-save.js | PostToolUse | Write, Edit | Lint on save |
| md-compress.js | PreToolUse | Read (.md) | Token compression |
| rtk-rewrite.js | PreToolUse | Bash | CLI output compression |
| teammate-quality.js | TeammateIdle, TaskCompleted | all | Team quality gate |
| stats-reader.js | (standalone) | n/a | Session stats |
| statusline.js | (statusLine) | n/a | Status bar |
task-log.js is the central event handler. It handles nine Claude Code hook events and maintains runtime state read by statusline.js:
Event → action mapping:
| Event | Action |
|---|---|
PreToolUse |
Logs Task/Agent and Skill invocations to logs/invocations.jsonl; opens codex plugin session file; increments per-tool-type state file |
PostToolUse |
Closes codex plugin session file when any Skill(codex:*) completes; computes wall-clock timing delta from the PreToolUse start marker and appends to ~/.claude/logs/timings.jsonl |
PostToolUseFailure |
Records timing with error status to timings.jsonl (same timing path as PostToolUse) |
UserPromptSubmit |
Writes a queue marker to state/queue/ to light the processing badge 💬 in statusline |
SubagentStart |
Creates state/agents/<id>.json with agent type, model, color, start timestamp — one file per agent (no race) |
SubagentStop |
Deletes per-agent file; appends completion entry to invocations.jsonl |
PreCompact |
Appends to logs/compactions.jsonl; extracts modified file paths from transcript; writes state/session-context.md |
Stop |
Clears state/tools/ — resets the 🔧 row between turns (agents intentionally NOT cleared — may still be running); clears state/queue/ processing markers (dismisses 💬 badge) and removes orphaned timing start markers from state/timings/ |
SessionEnd |
Deletes entire /tmp/claude-state-<session>/ directory (agents, tools, codex, queue, timings, dedup locks); runs git worktree prune; removes orphaned worktrees >2h |
State files layout:
/tmp/claude-state-<session>/
├── agents/<id>.json # one per active subagent (created at start, deleted at stop)
├── codex/<id>.json # one per active codex plugin session
├── tools/<tool>.json # one per tool type fired this turn (cleared at Stop)
├── timings/<tool_use_id>.json # in-flight timing start markers (PreToolUse → PostToolUse)
├── queue/<timestamp>.json # processing badge markers (UserPromptSubmit → Stop)
└── pending/<tool_use_id>.json # agent type cache for SubagentStart resolution
.claude/state/
└── session-context.md # modified-file breadcrumb (survives compaction)
.claude/logs/ # skill-specific logs (project-scoped)
# Hook audit logs are global — written to ~/.claude/logs/:
# invocations.jsonl append-only: agent launches, skill invocations, completions (includes project field)
# compactions.jsonl append-only: compaction events (includes project field)
# timings.jsonl append-only: per-tool wall-clock timing (includes project field)
Age-out rules:
- Agents: 10-minute safety-net — files older than 10 min with no corresponding Stop event indicate a crashed agent; statusline excludes them
- Codex plugin sessions: 30-minute cutoff — stalled plugin sessions are treated as timed out
- Worktrees: 2-hour cutoff in SessionEnd cleanup
PostCompact over-registration: PostCompact is registered in settings.json for task-log.js but is handled as a no-op — the code handles PreCompact instead.
Inline SessionStart hooks (shell commands, not JS files): (1) claude auth status > ~/.claude/state/subscription.json — snapshots billing plan for the status line billing indicator, async; (2) rm -f .claude/state/session-context.md — clears last session's breadcrumb on fresh startup.
Registered alongside task-log.js in settings.json:
lint-on-save.js (PostToolUse — Write, Edit) — closes the gap between "Claude edits a file" and "a human runs pre-commit" by linting every file the moment it is written. Runs pre-commit run --files <path> on each Write/Edit, exits 2 on failure so Claude sees the diagnostics and applies a fix immediately. No-op when .pre-commit-config.yaml is absent or pre-commit is not installed.
md-compress.js (PreToolUse — Read, .md files only) — transparently compresses token-wasteful whitespace in Markdown files before Claude reads them, reducing context consumption without altering content. Collapses table column padding (2+ spaces → 1), consecutive blank lines, and trailing whitespace — all outside fenced code blocks. Writes to a stable temp file keyed by source-path hash; reused within a session when the source is unchanged.
rtk-rewrite.js (PreToolUse — Bash) — rewrites supported CLI calls to go through the RTK proxy (git status → rtk git status) for 60–99% token savings on build/test/git output. RTK is a structural compressor — it understands git diff, pytest, and build-log formats and removes tokens that are visually useful to humans but informationally redundant for an LLM, unlike generic truncation which can drop the relevant parts. No-op when RTK is not installed — see root README → Token Savings.
Session stats utility — hooks/stats-reader.js is a standalone script (not a hook event) for inspecting session token and tool usage from JSONL history. Run directly:
node .claude/hooks/stats-reader.js --latest # most recent session
node .claude/hooks/stats-reader.js --latest --timings # + per-tool wall-clock stats from timings.jsonl
node .claude/hooks/stats-reader.js --date 2026-04-08 # all sessions on a date
node .claude/hooks/stats-reader.js <session-uuid> # specific session by UUID prefixOutput: JSON with token usage by model (input/output/cache), tool call counts, turn count, duration, and optional timing percentiles (count, mean_ms, p95_ms) per tool.
A lightweight hook (hooks/statusline.js) adds a persistent two-row status bar to every Claude Code session:
Row 1: claude-sonnet-4-6 │ Borda.ai-home │ Pro ~$1.20 │ ████░░░░░░ 38% │ 💬
Row 2: 🕵 2 agents (self-mentor, sw-engineer) │ 🤖 codex-rescue │ 🔧 Bash ×3 · Edit · Read ×12
Row 1 — model name · project directory · billing indicator · 10-segment context bar (green → yellow → red) · processing badge 💬 (cyan; shown while Claude is handling the current turn; disappears when done)
Row 2 — native agent count · Codex sessions (separate) · active tools (last 30 seconds)
Agent row (🕵) details:
- Specialized agents (have a
.claude/agents/file) → shown by type name in their declaredcolor:from frontmatter - General-purpose agents → shown by model name in gray (
opus,sonnet) - Same-type agents grouped with
×Ncount codex:*subagents are excluded here — they appear in🤖instead
Codex row (🤖) details:
- Shows the short name of each active codex session, without the
codex:prefix (e.g.,codex-rescue,review,adversarial-review) - Sources: both
Skill(codex:*)invocations andAgent(subagent_type="codex:*")subagents - Multiple sessions of the same type grouped as
<name> ×N - Safety-net: sessions older than 30 min are treated as timed out and excluded
Tool row colors: Read (blue) · Write (bright green) · Edit (green) · Bash (yellow) · Grep (cyan) · Glob (bright cyan) · WebFetch (magenta) · WebSearch (bright magenta) · Task/Agent (bright blue) · Skill (bright yellow)
Billing indicator:
- Subscription (Pro/Max):
Max/Pro/Sub ~$X.XXin cyan — plan from~/.claude/state/subscription.json;~$X.XXis theoretical API-rate cost (tokens × list price), not an actual charge - API key:
API $X.XXin yellow — actual spend at pay-per-token rates
Hook mechanics: statusline.js reads state/agents/, state/codex/, state/tools/, and state/queue/ on each render. task-log.js writes those files (including UserPromptSubmit → queue markers, Stop → queue drain); statusline.js only reads. Configured via statusLine in settings.json. Zero external dependencies — stdlib path and fs only.
→ Full architecture: root README → Claude + Codex integration
Install the Codex plugin in Claude Code — not an MCP server, a local plugin:
/plugin marketplace add openai/codex-plugin-cc
/plugin install codex@openai-codex
/reload-plugins
Skills check availability at runtime: claude plugin list 2>/dev/null | grep -q 'codex@openai-codex'. If the plugin is absent, each skill skips its Codex step gracefully rather than failing.
Invocation map — every place Claude dispatches to Codex and why:
| Skill | Site | Purpose | Plugin command |
|---|---|---|---|
/develop:fix, /develop:feature |
_shared/codex-prepass.md |
Tier 1 pre-pass: review staged diff for bugs before Claude's review cycle | codex:review --wait |
/oss:review |
Step 2 co-review | Adversarial diff review seeding agent prompts with pre-flagged issues | codex:adversarial-review --wait <focus> |
/oss:review, /research:run |
_shared/codex-delegation.md |
Delegate mechanical follow-up: docstrings, type annotations, test stubs | codex:codex-rescue (agent) |
/oss:resolve |
Step 8 action items | Apply PR review feedback to the codebase | codex:codex-rescue (agent) |
/oss:resolve |
Step 12a comment dispatch | Apply a specific review comment | codex:codex-rescue (agent) |
/oss:resolve |
Step 12 review loop | Review applied changes for issues before committing | codex:review --wait |
/research:run --codex |
Phase 2b ideation | Fallback: generate + apply one atomic optimization when Claude's change was reverted | codex:codex-rescue (agent) |
/calibrate |
Phase 1a problem gen | Generate synthetic calibration problems (JSON array written to run dir) | codex:codex-rescue (agent) |
/calibrate |
Phase 2 scoring | Score calibration responses against ground truth (JSON written to run dir) | codex:codex-rescue (agent) |
What Claude retains:
- Long-horizon planning and research (
/research:topic,/research:run,/develop:plan) - Orchestration of multiple agents in defined topologies
- Judgment calls: design decisions, spec approval, test validity assessment
- Final validation: Claude always verifies Codex output via
git diff HEADbefore accepting changes
Why the division works: Claude has a mental model of which files are "in scope" for a task; Codex reads the diff and codebase independently, without that context. Their blind spots are complementary — the union of both passes catches more than either alone.
Runtime artifacts live at the project root in dot-prefixed dirs — separate from versioned config in .claude/. The dot-prefix signals "generated output, not source".
.plans/blueprint/ ← /brainstorm spec and tree files
.plans/active/ ← todo_*.md, plan_*.md
.plans/closed/ ← completed plans
.notes/ ← lessons.md, diary, guides
.reports/calibrate/ ← /calibrate benchmark runs
.reports/resolve/ ← /oss:resolve lint+QA gate outputs
.reports/audit/ ← /audit analysis runs
.reports/review/ ← /oss:review multi-agent outputs
.experiments/ ← /research:run skill runs (improve mode)
.developments/ ← /develop:* review-cycle handoffs
.temp/ ← long output from any skill (quality-gates rule)
Each skill creates a timestamped run dir: .reports/<skill>/YYYY-MM-DDTHH-MM-SSZ/. Completed runs contain result.jsonl; the SessionEnd hook deletes completed runs older than 30 days automatically. Incomplete runs (crashed/timed-out) are kept for debugging. All dot-prefixed dirs are gitignored — see .claude/rules/artifact-lifecycle.md for TTL policy and full details.