Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@
**/*dont_commit_me*
web/packages/agenta-api-client/dist/
web/tsconfig.tsbuildinfo
# Agent Pi extension bundle, built by `pnpm run build:extension` and in the Docker image.
services/agent/dist/

__pycache__/
**/__pycache__/
Expand Down
3 changes: 3 additions & 0 deletions docs/design/agent-workflows/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,9 @@ running agent.
- [`wp-7-tools/`](wp-7-tools/README.md) — make runnable tools part of the agent config; resolve
Composio actions into Pi tools and route tool calls back through the existing
`POST /tools/call`, with MCP and workflow-as-tool as future adapters.
- [`wp-8-rivet-acp-runtime/`](wp-8-rivet-acp-runtime/README.md) — re-platform the service onto
`rivet-dev/sandbox-agent` so the agent is driven over ACP and the harness (Pi, Claude Code,
Codex) becomes a config value, running locally first; tools, Daytona, and the folder jail deferred.

## Related work

Expand Down
80 changes: 80 additions & 0 deletions docs/design/agent-workflows/wp-8-rivet-acp-runtime/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# WP-8: Rivet + ACP agent runtime

Status: design ready to implement. Start at [`plan.md`](plan.md). Decisions and open
items are in [`status.md`](status.md).

This folder is self-contained. A new engineer should be able to read it and implement the
work end to end without prior context. Read in this order: this README, then
[`context.md`](context.md) (the code that exists today), [`research.md`](research.md)
(verified facts about rivet, ACP, and the pattern we copy), [`architecture.md`](architecture.md)
(the target design), and [`plan.md`](plan.md) (the phased build).

## Summary

Re-platform the agent workflow service (`services/oss/src/agent.py`) so it drives the
agent over the **Agent Client Protocol (ACP)** through [`rivet-dev/sandbox-agent`](https://github.com/rivet-dev/sandbox-agent),
instead of the bespoke Pi JSON protocol it uses today.

The `/invoke` contract does not change. The handler still builds a user turn and returns
`{"role": "assistant", "content": ...}`. What changes is the transport behind the existing
`Harness` port: rivet runs the chosen harness (Pi, Claude Code) as an ACP session and
streams the reply back. Picking a different harness becomes a config value, not new code.

## The four requirements

1. **Drive the agent over ACP**, not the Pi JSON protocol. Rivet speaks ACP to the
harness; our service drives rivet.
2. **Swap harness as config.** The same agent config runs on Pi or Claude Code by setting
one value.
3. **Run locally.** The same path runs on a dev machine with no container, using rivet's
`local` provider. The rivet server is open source, so running it locally is normal.
4. **Defer tools.** Ship with no tools. The tool model is fixed (definition plus swappable
body, delivered per-harness over MCP), but nothing is built here.

## The design in five lines

- Keep `agent.py`, the `/invoke` contract, and the `Harness` port unchanged.
- Add a `RivetHarness` adapter behind the port, plus a small TypeScript runner that wraps
the rivet SDK.
- Run **one rivet daemon and one sandbox per invoke** (cold), then tear it down. This
copies the pattern Agenta already ships for code evaluators.
- Inject the trace context as an environment variable **at the daemon's birth** (the
sandbox `env_vars` on Daytona, the SDK `env` option locally). No fork of rivet or the
adapters is needed under this per-invoke model.
- Two axes swap independently: **sandbox** (local, daytona) and **harness** (pi, claude).

## Agent configuration (the contract to rivet: filesystem plus config)

- **AGENTS.md** — instructions, after variable substitution.
- **Input variables** — substituted into AGENTS.md, like prompt-template variables.
- **Skills** — laid into the workspace as files (path and format are per-harness).
- **Tool definitions** — schema only, separate from bodies. Empty here.
- **Harness** — `pi` / `claude`.
- **Sandbox** — `local` / `daytona`.
- **Secrets** — harness and LLM auth, passed as launch env, never written into the
agent-visible filesystem.

## In scope

ACP transport via rivet, harness swap (Pi and Claude Code), local run, and **tracing**
(the agent's spans must nest under the `/invoke` span; standalone traces are not
acceptable). Daytona and concurrency are described as the immediate follow-on phases.

## Deferred (each its own follow-on)

- **Tools** ([WP-7](../wp-7-tools/README.md)): the definition-plus-body model over MCP.
- **Folder isolation (the jail)**: rivet has no filesystem confinement. Needed only when a
single warm daemon hosts many agents at once. A TypeScript-or-Rust change, deferred. See
[`isolation-and-fork.md`](isolation-and-fork.md).
- **Multi-turn and streaming to the client** ([WP-4](../wp-4-multi-message-output/README.md)):
one turn in, one message out, matching today. A session is persisted message history
replayed via ACP `session/load`.
- **Standalone SDK runner**: run an agent from the SDK with a config. The adapters are
written to live in the SDK so this is a packaging step later, not a rewrite.

## Why rivet

Rivet is the thing we were about to hand-build in the `Harness` and `Runtime` ports: an
ACP daemon that drives several harnesses, keyed by session, over a swappable sandbox
(local, daytona) with an HTTP and SSE control plane. We adopt it unmodified (Apache-2.0).
The one capability it lacks, filesystem confinement, we are deferring.
176 changes: 176 additions & 0 deletions docs/design/agent-workflows/wp-8-rivet-acp-runtime/architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
# Architecture

## Principle

Keep the `Harness` port and the `/invoke` contract. Add one adapter behind the port that
runs the agent through rivet over ACP, and a small TypeScript runner that wraps the rivet
SDK. Everything Pi-specific moves below the port and becomes one harness choice.

```
unchanged
┌───────────────────────────────────────────────┐
│ agent.py (/invoke, /inspect, ag.create_app) │
│ _resolve_run_config / _latest_user_message │
│ _build_harness() ── selects adapter by env │
└───────────────────────────────────────────────┘
│ Harness port (setup / invoke / shutdown)
┌───────────────────────────────────────────────┐
│ RivetHarness (new, Python) │ PiHarness / PiHttpHarness
│ maps HarnessRequest + {harness, sandbox} → │ (kept; legacy path)
│ a one-shot rivet run; passes trace + secrets │
└───────────────────────────────────────────────┘
│ /run (HTTP or stdio), same contract family as runPi
┌───────────────────────────────────────────────┐
│ runRivet.ts (services/agent, wraps rivet SDK) │
│ start({ sandbox, env }) → createSession({ │
│ agent, cwd }) → write AGENTS.md → prompt → │
│ collect chunks → destroy │
└───────────────────────────────────────────────┘
│ spawns the daemon (local subprocess, or in Daytona)
┌───────────────────────────────────────────────┐
│ sandbox-agent daemon (Rust, one per invoke) │
└───────────────────────────────────────────────┘
│ ACP (JSON-RPC: session/prompt, session/update)
┌───────────────────────────────────────────────┐
│ harness ACP adapter subprocess in cwd │
│ pi-acp │ claude-code-acp │
└───────────────────────────────────────────────┘
```

The ACP boundary is daemon to harness. That is the requirement: the agent loop runs over
ACP, not the Pi JSON envelope. The service-to-rivet hop is rivet's own control surface and
stays harness-agnostic behind the port.

## Two orthogonal swap axes

These swap independently. Do not bundle them.

- **Sandbox (where the daemon runs):** `local`, `daytona`. A config value passed to
`runRivet`, which selects the rivet provider.
- **Harness (which engine):** `pi`, `claude`. A config value passed as the rivet `agent`.

The demo proves each separately: swap `local` and `daytona` with the harness fixed, and
swap `pi` and `claude` with the sandbox fixed.

## Lifecycle: one daemon and one sandbox per invoke (cold)

Each `/invoke` brings up its own daemon and sandbox, runs, and tears down. This copies the
shipped code-evaluator pattern (`DaytonaRunner`: an ephemeral sandbox per execution from a
snapshot, deleted in a `finally`). Two reasons it is the right default:

- It makes the daemon's environment **per-invoke**, which is what makes tracing work
without forking anything (see below).
- It needs no filesystem jail, because agents never share a daemon.

Cost is acceptable. Locally the daemon is a Rust binary that boots in tens of
milliseconds, so the per-invoke cost is the Node adapter spawn (~0.2 to 0.5s). On Daytona
the sandbox create adds ~1s. Concurrency is bounded the way evaluations already bound it
(see Concurrency).

## Tracing: inject at the daemon's birth

The agent's spans must nest under the `/invoke` span. Standalone traces are not
acceptable. The mechanism is uniform across sandboxes because each invoke owns its daemon:

- The static OTLP target and auth (`OTEL_*`, the Agenta endpoint and `Authorization`) and
the per-invoke `traceparent` go into the daemon's environment when it is created.
- **Local:** the SDK `env` option on `start({ sandbox: local(), env })`.
- **Daytona:** the sandbox `env_vars`, exactly like `DaytonaRunner` injects `AGENTA_*`.
- The daemon passes its env to the adapter subprocess, which passes it to the harness.
- **Pi:** install the `agenta-otel` logic as a Pi extension in the environment (global
`~/.pi/agent/extensions`, or baked into the Daytona snapshot). Pi loads it and emits
spans under the injected `traceparent`.
- **Claude Code:** set `CLAUDE_CODE_ENABLE_TELEMETRY=1`, `OTEL_*`, and `TRACEPARENT`, and
run it in `-p` / Agent-SDK mode.

No fork of rivet or the adapters is needed under the per-invoke model. A fork (the
TypeScript adapter reading ACP `_meta.traceparent`, not Rust) is only needed if a later
phase shares one warm daemon across concurrent invokes.

## Components

### `RivetHarness` (Python, new)

`services/oss/src/agent_pi/rivet_harness.py`, implements the `Harness` ABC. It holds the
harness id and sandbox choice (from config) and the trace/secret context, and maps a
`HarnessRequest` onto a `runRivet` `/run` call. Field mapping:

| `HarnessRequest` | Becomes |
| --- | --- |
| `agents_md` | written as `AGENTS.md` into the session `cwd` |
| `model` | session model where the harness honors it (the adapter normalizes this) |
| `prompt` | the ACP prompt text |
| `messages` | MVP uses the latest user turn; history replay is later |
| `tools` etc. | unused (empty) in WP-8 |
| `trace` | injected as daemon env (`traceparent`, OTLP endpoint, auth) |

### `runRivet.ts` (TypeScript, in `services/agent`)

Wraps the rivet SDK. Selected by env (`AGENT_BACKEND=rivet`) and serves the same `/run`
contract `runPi.ts` serves, so the Python side stays thin. Per invoke:

1. `start({ sandbox: local() | daytona({...}), env })` (env carries trace + secrets).
2. `createSession({ agent: <harness>, cwd })`.
3. Write `AGENTS.md` (and later skills) into `cwd`.
4. `prompt(sessionId, prompt)`, accumulate `agent_message_chunk` into the output.
5. `destroy()`.
6. Return `{ ok, output, sessionId, model }`.

### `agent.py` selection

Extend `_build_harness()` with `AGENTA_AGENT_RUNTIME=rivet` to return `RivetHarness`
(harness from `AGENTA_AGENT_HARNESS`, sandbox from config, default `local`). Keep the Pi
path as default so nothing regresses.

## Agent configuration (the contract: filesystem plus config)

Resolved before each run: AGENTS.md, input variables (substituted into AGENTS.md), skills
(files in the workspace), tool definitions (empty here), harness, sandbox, secrets. The
contract handed to rivet is files in `cwd` plus the session/daemon config. Secrets go as
launch env, never as files, because there is no jail.

## Tools: definition vs body (deferred, but shapes the seam)

A tool splits into a **definition** (the schema the model sees, stored in a neutral
OpenAI-function shape) and a **body** (the execution). The body is swappable: real,
service-backed, or mock. A test variant of an agent swaps bodies without touching
definitions. Delivery is per-harness over **MCP** (rivet's per-directory MCP config), not a
raw OpenAI array. The body model is general and not Agenta-specific: a self-contained body
runs in-process, a service-backed body (for example a Composio tool calling Agenta's
`/tools/call`) needs its service reachable (a local or remote Agenta), and a mock needs
nothing. WP-8 ships no tools; this is the shape to preserve, not build.

## Sessions and state

A session is the **stored message history**, not a kept-alive sandbox. Because we offer no
persistent file writes, nothing on disk is worth keeping. So: ephemeral sandbox per turn,
persisted messages, continue by replaying history with ACP `session/load` (Pi
`resumeSession`, Claude Code `loadSession`). Zero at-rest cost. The history store is the
backend DB on the platform and a local file standalone. Tradeoff: long-history replay
re-sends tokens, so cap it. Paused or FS-persisted sessions wait until we offer durable
writes.

## Concurrency

Mirror evaluations. Do not run the agent inside the API request if a background path is
available; dispatch it like an evaluation (taskiq worker on a Redis stream) and bound
concurrency with a shared semaphore. Each concurrent slot is one ephemeral sandbox, so the
semaphore caps how many sandboxes (and how much Daytona cost) run at once. Extra invokes
queue. Locally a slot is a cheap subprocess.

## Running standalone via the SDK (later)

The harness and sandbox adapters are written to live in the SDK, so the backend service
and a standalone run share one implementation. Running locally is not special: the rivet
server is open source (Apache-2.0, a static binary), so a local run runs that server
locally and the SDK wraps the rivet client. A standalone run fetches or loads a config,
then calls the SDK runner.

## What this does not change

No new endpoints. No change to `/invoke` or `/inspect` shapes. No tools, no jail, no
multi-turn, no client-side streaming. Each is its own follow-on.
89 changes: 89 additions & 0 deletions docs/design/agent-workflows/wp-8-rivet-acp-runtime/context.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# Context: the code that exists today

Read this to orient on the current service before changing it. All paths are in this repo
(`/home/mahmoud/code/agenta`).

## The agent service (WP-2)

`services/oss/src/agent.py` is an Agenta app exposing `/invoke` and `/inspect`, like the
chat and completion services. The handler `_agent(...)`:

1. Resolves config with `_resolve_run_config(...)`: model, AGENTS.md (the system text),
and tools, from the request `parameters` or the file config.
2. Builds the latest user turn with `_latest_user_message(...)`.
3. Picks a harness adapter with `_build_harness()` and calls the `Harness` port
(`setup` / `invoke` / `shutdown`).
4. Returns `{"role": "assistant", "content": result.output}`.

Trace context is captured in `_trace_context()` and threaded into the harness so the
agent's spans nest under the `/invoke` span.

## The ports (the seam we keep)

`services/oss/src/agent_pi/ports.py`:

- `Harness` (ABC): `setup()`, `invoke(HarnessRequest) -> HarnessResult`, `shutdown()`.
- `HarnessRequest`: `agents_md`, `model`, `prompt`, `messages`, `tools`, `custom_tools`,
`tool_callback`, `trace`.
- `HarnessResult`: `output`, `session_id`, `model`.
- `TraceContext`: `traceparent`, `baggage`, `endpoint` (OTLP), `authorization`,
`capture_content`. Has `to_wire()` (camelCase).
- `Runtime` (ABC): the sandbox/environment seam for the legacy Pi path (`start`,
`shutdown`, `exec`). The rivet path does not use `Runtime.exec`; it selects a rivet
provider instead (see architecture).

## The current Pi adapters (legacy, keep working)

- `services/oss/src/agent_pi/pi_harness.py` (`PiHarness`): spawns the TypeScript Pi
wrapper as a subprocess, one JSON object over stdio.
- `services/oss/src/agent_pi/pi_http_harness.py` (`PiHttpHarness`): POSTs the same JSON to
the wrapper running as an HTTP sidecar.
- Both send a Pi-shaped envelope (`{agentsMd, model, prompt, messages, tools, customTools,
toolCallback, trace}`).

## The TypeScript wrapper

`services/agent/` is a small Node service.

- `src/runPi.ts`: turns the envelope into direct Pi SDK calls (`createAgentSession`, ...).
- `src/agenta-otel.ts`: a Pi OTel helper. Today `runPi.ts` imports it in-process and emits
`invoke_agent` as a child of the incoming `traceparent`. Under rivet this logic must
become a Pi **extension** installed in the environment (see architecture, tracing).
- `src/server.ts` (HTTP `/run`) and `src/cli.ts` (stdio) are the two transports.

## The pattern we copy: how code evaluators run in Daytona

This is the shipped precedent for "ephemeral sandbox per execution", and the agent service
mirrors it.

- `sdks/python/agenta/sdk/engines/running/runners/` holds `base.py` (`CodeRunner`),
`local.py` (`LocalRunner`, in-process `exec`), `daytona.py` (`DaytonaRunner`, remote
sandbox), and `registry.py` (`get_runner()`).
- Selection: env `AGENTA_SERVICES_CODE_SANDBOX_RUNNER` (`local` default, `daytona` in
cloud).
- `DaytonaRunner.run()` creates an `ephemeral=True` sandbox from a snapshot
(`DAYTONA_SNAPSHOT`), runs, and deletes it in a `finally`. **One sandbox per execution.**
No warm pool, no shared instance. It injects `AGENTA_HOST`, `AGENTA_API_KEY`, and the
user's provider keys as the sandbox `env_vars`.
- Concurrency is bounded by the evaluation engine, not the runner: a shared
`asyncio.Semaphore(batch_size)` (default 10) in
`sdks/python/agenta/sdk/evaluations/runtime/processor.py`. So at most ~10 ephemeral
sandboxes exist at once.
- Daytona config lives in `api/oss/src/utils/env.py` (`DaytonaConfig`:
`DAYTONA_API_KEY`, `DAYTONA_API_URL`, `DAYTONA_SNAPSHOT`, `DAYTONA_TARGET`).

## What we change and what we keep

Change: the transport behind the `Harness` port becomes rivet over ACP, with harness and
sandbox as config values.

Keep: the `/invoke` and `/inspect` contract, the `Harness` port and its dataclasses, the
config resolution in `agent.py`, and the env-driven adapter selection in
`_build_harness()` (extended with a rivet branch). The legacy Pi adapters keep working so
nothing regresses.

## Conventions

- Standalone scripts run with `uv run` and inline `# /// script` dependencies.
- Python edits: `ruff format` then `ruff check --fix` before committing.
- Local-server parity is a first-class requirement carried from WP-2.
Loading
Loading