Skip to content

chore(agent): docker cleanups for the sandbox-agent sidecar#4762

Closed
mmabrouk wants to merge 1 commit into
feat/agent-harness-portfrom
chore/agent-docker-cleanups
Closed

chore(agent): docker cleanups for the sandbox-agent sidecar#4762
mmabrouk wants to merge 1 commit into
feat/agent-harness-portfrom
chore/agent-docker-cleanups

Conversation

@mmabrouk

@mmabrouk mmabrouk commented Jun 19, 2026

Copy link
Copy Markdown
Member

This PR is part of a stack. Review bottom-up.

Each PR's diff is only its own delta. Merge from the bottom. This PR's base is #4761 (merge that first).

Context

The agent runner sidecar (the sandbox-agent server runtime in services/agent/src/server.ts) had a dev Dockerfile but no production image and no written rules for what an image may contain. This PR adds the production image and the licensing posture that governs every agent image. It branches off feat/agent-harness-port as the infra-chore side slice in docs/design/agent-workflows/pr-stack.md.

What this changes

This PR adds three things and clarifies one comment:

  • A production Dockerfile for the sidecar. It bakes the TypeScript runner source, runs pnpm run build:extension, and serves on :8765 with tsx. No tsx watch and no bind mount, unlike the dev image.
  • A README.md that states the rule for agent images: we ship build recipes, never Claude-containing images, and never a baked credential.
  • A licensing note in the WP-8 Daytona snapshot builder explaining why the recipe-not-image model is compliant.
  • The dev compose comment now says the snapshot bakes Pi and certs, with Claude inherited from rivet's -full base, not baked by us.

No binary, snapshot, or Claude artifact is committed. The repo ships the builder script, so each operator builds their own image in their own account.

Key architectural decision to review

The decision to scrutinize is the licensing posture in services/agent/docker/README.md and the Dockerfile header. Claude Code is proprietary under Anthropic's Commercial Terms, which grant a usage license but no right to redistribute. Pi is MIT, so we bake it freely.

The tradeoff: we never produce an image that contains Claude Code. The production Dockerfile installs nothing from Anthropic at build time. Instead the sandbox-agent daemon installs Claude at runtime from Anthropic over HTTPS, which is why ca-certificates is installed (Dockerfile:25). That keeps Anthropic as the distributor. The Daytona snapshot is the one place this gets subtle: today the builder bases on rivet's -full image, which already bundles Claude. That stays compliant only because we ship the builder script, not the built snapshot, so each operator builds the Claude-containing image in their own account and we distribute nothing. The README records a cleaner-provenance follow-up: base on a daemon-only rivet image and install Claude from Anthropic at build, pending a live Daytona check that the daemon-only tag ships the ACP adapters.

Verify the rule holds end to end: nothing in this PR commits a Claude binary or a prebuilt snapshot, and no image baked here contains Claude or a credential.

How to review this PR

  1. Read services/agent/docker/README.md first. It is the contract the other files implement.
  2. Read services/agent/docker/Dockerfile. Confirm it bakes only Pi (via npm deps) and source, installs no Anthropic package, and bakes no key or login. Check that ca-certificates exists because the daemon fetches Claude over TLS at runtime.
  3. Skim the compose comment and the build_rivet_snapshot.py docstring. These are doc-only and restate the same posture.
  4. Skip the runner source and the snapshot builder logic. This PR only edits the docstring there, not behavior.

Tests / notes

Doc and Dockerfile only, no code paths change. The production image is not built in CI by this PR. The cleaner-provenance snapshot follow-up is parked because it needs a live Daytona build to confirm the daemon-only rivet tag ships the ACP adapters.

- Add a production, credential-free sidecar Dockerfile (services/agent/docker/Dockerfile):
  bakes Pi (MIT), never bakes Claude Code or any credential, runs src/server.ts without a
  watcher. Verified to build and serve /health.
- Add services/agent/docker/README.md documenting the image licensing posture (bake Pi,
  never bake or distribute Claude Code, install Claude from Anthropic at runtime) and the
  API-key vs self-host OAuth auth paths.
- Record the recipe-not-image posture on the Daytona snapshot builder docstring and the
  docker-compose comment (we ship the build recipe, not a Claude-containing image).
@dosubot dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Jun 19, 2026
@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: ca49a223-93ab-4ca6-b3c1-ff291bab066e

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • ✅ Review completed - (🔄 Check again to review again)
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch chore/agent-docker-cleanups

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@dosubot dosubot Bot added the devops label Jun 19, 2026
@mmabrouk

Copy link
Copy Markdown
Member Author

Reviewer guide: interesting code

Start with the README. It is the rule the other files implement.

  • services/agent/docker/README.md:14 — the one-line rule for every agent image: ship build recipes, never Claude-containing images, never a baked credential. Read this before anything else.
  • services/agent/docker/Dockerfile:9 — the header that says Claude Code is NEVER baked; confirm nothing below installs an Anthropic package or a key.
  • services/agent/docker/Dockerfile:25ca-certificates is installed because the daemon fetches Claude over TLS at runtime, which is the mechanism that keeps Anthropic as the distributor.
  • docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/build_rivet_snapshot.py:15 — the subtle case: the snapshot bases on rivet's Claude-bundling -full image, compliant only because we ship the builder, not the built snapshot.

# the daemon's `install-agent claude` fails TLS verification. git lets npm/installers
# fetch git deps.
RUN apt-get update \
&& apt-get install -y --no-install-recommends ca-certificates git \

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Load-bearing: ca-certificates is here because the daemon fetches Claude from Anthropic over TLS at runtime. node:*-slim omits the trust store, so without this install-agent claude fails verification. This is the mechanism that keeps Anthropic as the distributor and the image Claude-free.

# Install deps as a cached layer (manifest + lockfile only). The full dependency set is
# installed (not --prod): the runtime uses `tsx` and the extension build uses `esbuild`,
# both devDependencies.
COPY package.json pnpm-lock.yaml ./

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Full dependency set is installed (not --prod) on purpose: the runtime uses tsx and the extension build uses esbuild, both devDependencies. Worth confirming this stays intentional if anyone later tries to slim the image.

its own for internal use; self-hosters build their own. We never hand anyone a
Claude-containing image, so this is compliant even though the `-full` base bundles
Claude (Anthropic's Commercial Terms forbid us *distributing* Claude Code, not
building/using it).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the subtle compliance case. The -full base bundles Claude, so the built snapshot contains Claude. It stays compliant only because we ship this script and each operator builds the snapshot in their own account; we distribute nothing. The cleaner-provenance follow-up (daemon-only base + install from Anthropic at build) is parked pending a live Daytona check that the daemon-only tag ships the ACP adapters.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 9aa9244b-be9b-4a05-beb3-8a797c78fe34

📥 Commits

Reviewing files that changed from the base of the PR and between 8b00633 and c34a392.

📒 Files selected for processing (4)
  • docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/build_rivet_snapshot.py
  • hosting/docker-compose/ee/docker-compose.dev.yml
  • services/agent/docker/Dockerfile
  • services/agent/docker/README.md

Comment on lines +16 to +55
FROM node:24-slim

WORKDIR /app

# CA certificates: the sandbox-agent daemon (Rust) downloads harness CLIs (e.g. Claude
# Code) over HTTPS using the system trust store, which node:*-slim omits — without this
# the daemon's `install-agent claude` fails TLS verification. git lets npm/installers
# fetch git deps.
RUN apt-get update \
&& apt-get install -y --no-install-recommends ca-certificates git \
&& rm -rf /var/lib/apt/lists/*

RUN corepack enable

# Install deps as a cached layer (manifest + lockfile only). The full dependency set is
# installed (not --prod): the runtime uses `tsx` and the extension build uses `esbuild`,
# both devDependencies.
COPY package.json pnpm-lock.yaml ./
RUN pnpm install --frozen-lockfile

# Bake the source (no bind mount in production).
COPY tsconfig.json ./
COPY scripts ./scripts
COPY src ./src
COPY config ./config
COPY skills ./skills

# Bundle the Agenta Pi extension (tracing + tools) into dist/. runSandboxAgent installs
# this baked copy into Pi's agent dir on every run. Rebuild the image after editing
# src/extensions/agenta.ts or the tracer.
RUN pnpm run build:extension

ENV NODE_ENV=production \
PORT=8765

EXPOSE 8765

# Call the local tsx binary directly to avoid pnpm/corepack HOME writes when the
# container runs as a non-root host uid.
CMD ["node_modules/.bin/tsx", "src/server.ts"]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Run the final container as a non-root user.

The image still launches the HTTP server as root. For a network-facing agent runner, that widens the blast radius of any compromise.

🔧 Suggested fix
 RUN pnpm run build:extension
 
 ENV NODE_ENV=production \
     PORT=8765
 
 EXPOSE 8765
+USER node
 
 CMD ["node_modules/.bin/tsx", "src/server.ts"]

Source: Linters/SAST tools

@mmabrouk

Copy link
Copy Markdown
Member Author

Superseded. Replacing the path-based stack with PRs sliced by functional area showing final code only, so reviewers don't comment on intermediate scaffolding that a later PR rewrites. See the new set.

@mmabrouk mmabrouk closed this Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

devops size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant