Implement Perry container subsystem and orchestration engine#77
Implement Perry container subsystem and orchestration engine#77yumin-chen wants to merge 6 commits into
Conversation
feat: implement production-ready container and workload orchestration
Finalize the OCI stack by implementing the `perry/container` and
`perry/container-compose` (workloads) subsystems. This moves the
implementation from initial stubs to a hardened, spec-compliant architecture.
Core Subsystems:
- Orchestration: Implemented `WorkloadGraphEngine` and `ComposeEngine`
using Kahn's algorithm for deterministic dependency resolution and
topological startup/shutdown/rollback.
- Backend Logic: Multi-layered auto-detection for 7+ runtimes (Apple, Podman,
Docker, Lima, etc.) with liveness probes and strict priority ordering.
- Security & Policy:
* Implemented `PolicySpec` enforcement (Isolated, Hardened, Untrusted).
* Added image verification via Sigstore/cosign (opt-in via environment).
* Hardened ephemeral runners with `cap_drop: ALL`, seccomp, and read-only
root support.
- FFI Bridge: Expanded `perry-stdlib` with async-safe, promise-based
handlers optimized for raw C-ABI passing of primitives.
Technical Details:
- Restructured `perry-container-compose` into a flat module layout.
- Standardized container naming to `{image_hash_8}-{random_hex8}` with
label-based orphan cleanup.
- Refactored `CliBackend` to be generic over `CliProtocol` for zero vtable
overhead.
- Modernized internal registries with `DashMap` for concurrent access.
- Integrated with Perry compiler (HIR registration and codegen dispatch).
Refinements & Fixes:
- Fixed SQLite linker conflicts by gating runtime stubs.
- Restored `Buffer` synonym and `process.argv` specialization in `lower.rs`.
- Implemented robust IP and label extraction for the `DockerProtocol`.
- Expanded `MockBackend` for high-fidelity orchestration testing.
Validation:
- Added 12 new tests covering orchestration states and policy enforcement.
- Verified 79/0 pass in `perry-container-compose`.
- Verified 33/0 pass in `perry-stdlib` container features and smoke tests.
…rfaced by running it (v0.5.371)
Replaces example-code/forgejo-deployment with a production-quality
deployment using the real Forgejo image — `data.forgejo.org/forgejo/
forgejo:11`, the official Forgejo OCI registry that's separate from
codeberg.org's gated mirror, no Gitea fallback. Driven by running the
example end-to-end against live Docker, surfaced and fixed nine
interlocking codegen + FFI + orchestration bugs that together blocked
any non-trivial compose stack from running.
CODEGEN / FFI fixes (composeUp / down / handle round-trip):
1. composeUp({...}) failed at JSON parse — codegen StrPtr arm passed
raw object pointer through `js_get_string_pointer_unified`, FFI
read it as StringHeader. New runtime helper
`js_value_to_str_ptr_for_ffi` returns the heap string pointer for
actual strings/SSO and otherwise routes through `js_json_stringify`
for object/array/number/bool args.
2. getBackend() returned "unknown" before any async FFI — BACKEND
OnceLock was empty. js_container_getBackend now does a synchronous
in-place probe (block_in_place inside a tokio worker, fresh
current_thread runtime otherwise).
3. composeUp Promise resolved with f64=5e-324 (subnormal). Bare u64
handles in the result_bits slot decoded as f64 bits; `${stack}`
interpolation printed "0". `handle_to_promise_bits(id)` NaN-boxes
with POINTER_TAG | (id & POINTER_MASK); Ok(0u64) void resolutions
become PROMISE_VOID_BITS = TAG_UNDEFINED. Swept across 23 sites.
4. down(stack, opts) failed with "Invalid compose handle". Codegen
dispatch `args: &[NA_F64, NA_F64]` lowered both args to LLVM
double, but Rust signatures took (handle_id: i64, volumes: i32) —
calling-convention mismatch. Changed every compose handle-arg FFI
signature to (handle: f64, ...) and added handle_id_from_f64
helper.
5. exec(stack, 'svc', cmd) failed with "No such container".
service::service_container_name regenerated random suffix per call.
Added `service_container_names: Mutex<HashMap>` cache to
ComposeEngine populated by up()'s start loop.
6. ${VAR:-default} env interpolation didn't apply to TS-side specs —
postgres bombed with "FATAL: invalid character in extension owner"
because literal placeholder strings flowed through. Wired
`perry_container_compose::yaml::interpolate` into parse_compose_spec
so ${VAR} expansion happens before serde_json::from_str (matches
SPEC §7.8 / §7.9 — same engine, FFI boundary).
7. down(stack, { volumes: false }) silently REMOVED volumes.
js_compose_down took (handle: f64, volumes: f64) but TS users pass
an options object. The object NaN-boxed to a non-zero pointer →
`volumes != 0.0` → remove_volumes flipped to true. Changed dispatch
to `[NA_F64, NA_STR]`; FFI parses the JSON-encoded DownOptions
server-side via serde_json (same shape as composeUp).
8. ComposeEngine::down() called rollback() unconditionally, which
drains session_volumes regardless of the volumes-preserve flag.
Snapshot+restore around rollback when remove_volumes=false.
9. types/perry/compose/index.d.ts was missing `healthcheck`, `user`,
`working_dir`, `read_only`, `privileged`, `cap_add`, `cap_drop`
on Service plus `internal`, `driver_opts`, `labels` on
ComposeNetwork — runtime supported them, TS surface didn't.
Added a `Healthcheck` interface (compose-spec §service.healthcheck:
test, interval, timeout, retries, start_period, disable) and
extended both interfaces.
EXAMPLE structure (example-code/forgejo-deployment/main.ts):
- Two-service stack: postgres:16-alpine + data.forgejo.org/forgejo/
forgejo:11.
- depends_on: { db: { condition: 'service_healthy' } }.
- Per-service compose-spec healthchecks: pg_isready for postgres,
wget /api/healthz for forgejo.
- Explicit container_name on each service so Docker's embedded DNS
routes forgejo→forgejo-db (Perry's compose engine doesn't yet
register the service-key as a network alias; documented).
- Internal-only forgejo-db-net (postgres unreachable from host or
sibling stacks); public forgejo-web-net for forgejo's web + SSH
ports.
- Standard Forgejo "OpenSSH on port 22 + START_SSH_SERVER=false"
configuration — the inline-Go SSH server conflicts with the
entrypoint's sshd otherwise (exit-0 with "bind: address already
in use").
- Lifecycle:
./forgejo_app deploy + verify-healthz + exit 0
./forgejo_app --down tear down (preserves volumes)
FORGEJO_DESTROY_ON_EXIT=1 ./forgejo_app --down also drops
volumes
Perry's `process.on('SIGINT', ...)` handler isn't actually invoked
at runtime (confirmed by probe — kill -INT after register; setInterval
keeps ticking), so the example uses `docker compose up -d` style:
exit 0 after success, separate --down command for teardown.
- Production note in doc-comment: FORGEJO_SECRET_KEY,
FORGEJO_INTERNAL_TOKEN, FORGEJO_DB_PASSWORD MUST be stable across
redeploys against the same volumes (random defaults break
Forgejo's encrypted-config decryption + postgres rows).
- Local `tsconfig.json` with paths: { "perry/*": ["../../types/
perry/*"] } for IDE typechecking + `perry-globals.d.ts` declaring
the subset of `process` Perry actually exposes (env, exit, on,
argv, cwd, platform — minimal, not @types/node).
- Workspace re-registration: re-added perry-container-compose to
[workspace] members + default-members + [workspace.dependencies].
VERIFIED full lifecycle:
fresh up → containers healthy, /api/healthz returns "pass"
--down preserve → containers gone, volumes intact
redeploy → containers come back, Forgejo decrypts existing
config (stable secrets), healthz passes again
--down destroy → containers + volumes + networks all gone
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…container/ section
Seven new pages cover overview, single-container lifecycle (perry/container), compose orchestration (perry/compose), networking (incl. the container_name DNS workaround), volumes, security, and a Forgejo-deployment case study. New docs/examples/stdlib/container/snippets.ts with 11 ANCHOR blocks pulled into the markdown via {{#include}}. doc-tests --lint and --filter container both pass.
Audit-driven sweep covering every blocker (Tier 1) and important UX
issue (Tier 2) from the post-v0.5.371 production-readiness review.
Tier 1 (correctness blockers):
1. Project-namespace volumes + networks. ComposeEngine::up/down now
prefix declared volume + network names with the project name
(`<project>_<name>`) so two stacks declaring the same volume key
don't collide and corrupt each other's data. Honors `external: true`
(no prefix) and explicit `name:` overrides. Bind mounts stay literal.
2. Respect `external: true` on `down()`. Pre-fix removed every
network/volume in `spec.networks` regardless of the external flag,
silently deleting shared infra the user marked external.
3. ps/logs/exec/list/inspect/listImages TS contract honesty. The
FFIs returned an opaque registry-id handle but the TS .d.ts
declared `Promise<ContainerInfo[]>`/`Promise<ContainerLogs>`, so
`(await ps(stack)).every(...)` was broken. Switched 8 FFIs to
`spawn_for_promise_deferred` returning `serde_json::to_string`;
TS users `JSON.parse(await ps(stack))` to recover the typed array.
.d.ts updated to `Promise<string>` with explicit JSDoc.
4. Rollback bookkeeping. The "exists but stopped → start" branch
wasn't pushing to `session_containers`, leaving a started
container running on a later service-startup failure.
5. Workspace registration stability. perry-stdlib's
perry-container-compose dep is now direct-path so the build
doesn't break if `[workspace.dependencies]` gets stripped. New
`tests/container_workspace_invariants.rs` fails fast with a
clear message if the crate is missing from `[workspace] members`.
Tier 2 (UX + ergonomic):
6. Service-key network alias. `ContainerSpec` gains `network_aliases:
Option<Vec<String>>`; both DockerProtocol and AppleContainerProtocol
`run_args` emit `--network-alias <name>` per entry. ComposeEngine::up
populates with the service KEY plus any long-form `aliases:`. Sibling
containers can now resolve `db:5432` / `api:8080` via embedded DNS
without explicit `container_name`.
7. Default SIGINT/SIGTERM handler. `js_container_module_init` now
installs an OS-process-level signal handler (tokio
`signal::unix`/`ctrl_c`) that walks `COMPOSE_HANDLES` and calls
`down(volumes=false)` on each engine before exiting with the
signal-mapped status (130/143). Volumes preserved by default.
Opt-out: `PERRY_NO_DEFAULT_SIGINT_CLEANUP=1`. Idempotent.
8. `detectBackend()` typed surface. Added `BackendInfo` interface to
`types/perry/container/index.d.ts` documenting the JSON shape so
users know what to expect from `JSON.parse(await detectBackend())`.
9. Sync `perry/container` ComposeService + ComposeNetwork types with
the perry/compose copies — pre-fix the perry/container surface was
missing `healthcheck`, `user`, `working_dir`, `read_only`,
`privileged`, `cap_add`, `cap_drop` on Service and `internal`,
`driver_opts`, `labels` on ComposeNetwork.
10. Three-mode image verification. `PERRY_CONTAINER_VERIFY_IMAGES`
now accepts `off` (default — skip), `warn` (run cosign, log
stderr warning on fail, proceed), `enforce`/`1`/`on` (reject on
fail). Production should set `enforce`.
11. perry/workloads marked alpha. New `types/perry/workloads/
{index.d.ts,package.json}` declares the surface but carries a
prominent ALPHA — NOT PRODUCTION-READY note listing the
not-yet-shipped functionality (parallel strategies, edge
healthcheck waiting, microVM/WASM backends, no integration tests).
Recommends perry/compose for production.
12. Idempotency-on-spec-change detection. ComposeEngine::up now
stamps every container with a `perry.compose.spec_hash` label
(`md5(serde_json(svc))[..16]`); on subsequent up() calls reads
the live label and recreates on drift. Pre-fix, editing image
tag and re-running up() was a silent no-op.
Verified:
cargo test -p perry-container-compose 81/0 (+11 over v0.5.371)
cargo test -p perry-stdlib --features container --lib + --test container_* 70/0
cargo check --workspace (excl. cross-compile-only) clean
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…p API Test infrastructure for the perry/container subsystem covering all seven layers from TS user code → live OCI runtime, plus a new TS-side cleanup API that drives the test layers away from manual stack-handle teardown. == Phase A — hermetic functional + protocol args (20 new tests) == `tests/functional_orchestration.rs` (gated on `--features test-utils`) exercises ComposeEngine::up/down with MockBackend across every Tier 1+2 invariant from v0.5.372: rollback removes session networks/containers, project namespacing prefixes volumes/networks, `external: true` respected on create AND down, container-name caching, spec-hash drift recreate, idempotent skip on hash match, service-key network-alias propagation, topological ordering. Plus 5 protocol-arg tests in `backend.rs::tests` pinning seccomp emission, entrypoint array form, `rm: None` doesn't emit `--rm`, minimal-spec doesn't emit spurious flags, apple/container omits `--detach`. == Phase B — FFI bug regressions (8 new tests) == `crates/perry-stdlib/tests/container_bug_regressions.rs` permanently pins each bug surfaced during v0.5.370/371/372 rollout: A1 composeUp object-arg auto-stringify, A3 handle round-trips through POINTER_TAG NaN-boxing, A6 env interpolation at FFI boundary (set + unset cases), A7 down options JSON parse, A9 extended Service fields (healthcheck/user/working_dir/read_only/privileged/cap_*), spec-hash determinism, null-pointer rejection at parse_compose_spec, VerifyMode env-var dispatch matrix. == Phase C — live-runtime integration (6 tests, env-gated) == `tests/live_runtime_tests.rs` (gated on `--features integration-tests` AND `PERRY_INTEGRATION_TESTS=1`) exercises the full stack against real docker/podman/apple-container: run + remove of one-shot alpine, full compose lifecycle with healthcheck + alias, down preserves volumes by default, external network survives down, cross-service DNS via `--network-alias` resolves the service KEY, two stacks with same volume key don't collide. Each test owns a `ProjectCleanup` RAII struct that drains every container labelled with the test's project name on Drop — even if assertions panic — so flaky tests don't leak orphans. == Phase D — e2e (Perry compile + run) == New `crates/perry-container-e2e` harness compiles each `tests/e2e/ *.e2e.ts` via the released `perry` CLI binary, runs the resulting executable, and asserts exit 0 + `[e2e] PASS` on stdout. Two .e2e.ts files: `redis-smoke.e2e.ts` (compose lifecycle, fast) and `forgejo-stack.e2e.ts` (full TS → HIR → codegen → FFI → ComposeEngine → Docker chain with healthcheck-gated depends_on, in-container exec, idempotent redeploy). Both use the new `downByProject` API for cleanup — no manual `down(stack)` boilerplate. == Phase E — fixtures + property tests + fuzz == 5 golden YAML fixtures under `tests/fixtures/`: simple-two-service, diamond-deps, cyclic-deps (must reject), external-network, healthcheck-gated. 5 proptest properties: container-name format, MD5 prefix determinism, project-namespace disambiguation, spec-hash determinism, topological sort respects edges. 3 libfuzzer targets in `crates/perry-container-compose/fuzz/`: compose_yaml_parse, env_interpolation, compose_spec_json_round_trip — surface parser DoS, panics, round-trip drift. == Cleanup API == Three new functions on `perry/container` that work WITHOUT a ComposeHandle, recovering cleanly from crashes / different processes / mid-test panics: downByProject(project, opts?) → Promise<JSON CleanupReport> downAll(opts?) → Promise<JSON CleanupReport> removeIfExists(idOrName, force?) → Promise<"true" | "false"> Driven through the existing `ContainerBackend::list/stop/remove` surface — no new backend trait methods. FFI exports `js_container_downByProject`/`_downAll`/`_removeIfExists`, codegen dispatch entries, TS `.d.ts` declarations. The Phase C live tests use this via a new `ProjectCleanup` RAII struct; the Phase D e2e tests use it directly from TS via `downByProject(PROJECT)`. == CI workflow rewrite == `.github/workflows/container-tests.yml` updated to drive all five layers: Layer A+B (hermetic) every PR macos-14 + ubuntu-24.04 Layer C (live runtime) PR + main apple/container + podman Layer D (e2e) main + tags + manual docker Layer E (fuzz) nightly + manual libfuzzer Required-check gate (`container-tests-gate`) needs only Layer A+B — live + e2e + fuzz are informational so a slow registry / runtime hiccup doesn't block PR merges. == Final tally == perry-container-compose default features: 96/0 perry-container-compose --features test-utils: 111/0 (+15 functional) perry-stdlib container modules: 78/0 perry-container-e2e: 2/0 (skip without env) Live-runtime (PERRY_INTEGRATION_TESTS=1): 6 (gated) Fuzz targets (nightly): 3 (gated) 191 hermetic tests + 9 env-gated = full coverage of the v0.5.372 Tier 1 + Tier 2 surface, with every previously-shipped bug having a permanent regression test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Implement production-ready container naming ({md5}-{random}) in service.rs
- Ensure ComposeEngine::up idempotency via perry.compose.spec_hash labels
- Register perry/container, perry/compose, and perry/workloads in HIR/Codegen
- Implement missing FFI endpoints (inspectImage, inspectNetwork) in perry-stdlib
- Group iOS with macOS in backend detection priority
- Add perry-container-compose to workspace members for consistency
- Break dependency ties alphabetically in Kahn's algorithm implementation
Co-authored-by: yumin-chen <10954839+yumin-chen@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
a7e9d31 to
dd181eb
Compare
This PR implements the Perry container subsystem according to the production-readiness specification.
Key changes:
service_container_nameto hash the full JSON service definition, ensuring deterministic yet collision-resistant naming. UpdatedComposeEngineto useperry.compose.spec_hashfor idempotentup()operations.perry-hir(expr_call.rs) andperry-codegen(lower_call.rs). This includes mappingperry/container,perry/compose, andperry/workloadsmethods to their canonical FFI symbols.js_container_inspectImageandjs_container_inspectNetworktoperry-stdlibto complete the container lifecycle contract.iosin the macOS-style backend detection path.perry-container-composein the rootCargo.tomlto ensure it is built by default and passes workspace-wide invariant checks.All tests in
perry-container-composeand relevantperry-stdlibcontainer tests passed.PR created automatically by Jules for task 882198481856536362 started by @yumin-chen