Skip to content

feat(sidecar): configurable logfmt logging + ext stall-watchdog tracing#1517

Closed
NathanFlurry wants to merge 1 commit into
mainfrom
nathan/sidecar-observability-logfmt
Closed

feat(sidecar): configurable logfmt logging + ext stall-watchdog tracing#1517
NathanFlurry wants to merge 1 commit into
mainfrom
nathan/sidecar-observability-logfmt

Conversation

@NathanFlurry

Copy link
Copy Markdown
Member

What

Make the agentos-sidecar logging configurable (matching the ~/rivet setup) and add the tracing needed to see where an ext request — especially a tool-call turn — spends its time or hangs.

1. Logging setup (mirrors rivet's init_tracing)

crates/agentos-sidecar/src/main.rs previously hardcoded Level::ERROR with no EnvFilter, so RUST_LOG was silently ignored and the sidecar was effectively unobservable. Now:

  • Level: AGENTOS_LOG_LEVEL > LOG_LEVEL > RUST_LOG > info.
  • Format: logfmt by default (tracing-logfmt), RUST_LOG_FORMAT=text to opt out.
  • Sink: AGENTOS_LOG_FILE writes to a file; otherwise stderr. Never stdout (that carries the binary frame protocol).
  • Field toggles: RUST_LOG_{SPAN_NAME,SPAN_PATH,TARGET,LOCATION,MODULE_PATH,ANSI_COLOR}.

2. ext-request tracing + stall watchdog

crates/agentos-sidecar/src/acp_extension.rs: wrap handle_payload in an ext.request span with kind + elapsed_ms, emit received/handled events, and — the key bit — a stall watchdog that warn!s every 10s while a request is still in flight. A hung tool turn now leaves a breadcrumb long before the host's 120s frame timeout instead of timing out silently.

Why

Debugging a Zid tool-call hang (timed out waiting for sidecar protocol frame for ext), we found the sidecar emitted nothing: ERROR-only, RUST_LOG-deaf, and no per-request timing. This restores parity with rivet logging and makes the hang (and session-creation latency) diagnosable.

Notes

  • Default level is info; prod behavior is unchanged unless a level is set.
  • Pairs with a secure-exec PR adding binding.invoke tracing on the tool-resolution path (separate PR; independent to compile).
  • cargo check -p agentos-sidecar passes.

🤖 Generated with Claude Code

Replace the hardcoded ERROR-level tracing subscriber in agentos-sidecar with a
rivet-style init_tracing: EnvFilter-gated level (AGENTOS_LOG_LEVEL > LOG_LEVEL >
RUST_LOG > info), logfmt format (RUST_LOG_FORMAT=text to opt out), and an
optional file sink (AGENTOS_LOG_FILE). Output stays on stderr/file, never stdout
(the frame channel).

Add an ext.request span + entry/exit timing in acp_extension, plus a stall
watchdog that warns every 10s while a request is in flight, so a hung tool turn
surfaces as a breadcrumb long before the host's 120s frame timeout instead of
silently.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@railway-app railway-app Bot temporarily deployed to agentos / agentos-pr-1517 June 24, 2026 03:56 Destroyed
@railway-app railway-app Bot temporarily deployed to rivet-frontend / agentos-pr-1517 June 24, 2026 03:56 Destroyed
@railway-app

railway-app Bot commented Jun 24, 2026

Copy link
Copy Markdown

🚅 Deployed to the agentos-pr-1517 environment in agentos

Service Status Web Updated (UTC)
agentos 😴 Sleeping (View Logs) Web Jun 24, 2026 at 4:05 am

🚅 Deployed to the agentos-pr-1517 environment in rivet-frontend

Service Status Web Updated (UTC)
agent-os 😴 Sleeping (View Logs) Jun 24, 2026 at 4:01 am

@NathanFlurry

Copy link
Copy Markdown
Member Author

Superseded by the combined PR #1518 (docs + options-schema + sidecar/host-tools-zod together).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant