Skip to content

Audit payload capture: add operator surface (env / forge.yaml) + redact pass #163

@initializ-mk

Description

@initializ-mk

Context

`AuditPayloadCapture` was introduced as FWS-8 / issue #91 and the
emit-side wiring on `tool_exec` / `llm_call` events is built. Two
gaps make it unusable in production today.

Gap 1 — No operator surface

The struct `coreruntime.AuditPayloadCapture`
(`forge-core/runtime/audit_payload_capture.go:27`) carries four
independent capture flags:

```go
type AuditPayloadCapture struct {
LLMMessages bool // capture chat messages on llm_call
LLMResponse bool // capture model completion on llm_call
ToolArgs bool // capture raw tool input on tool_exec
ToolResult bool // capture raw tool output on tool_exec
CapLLMMessagesBytes int
CapLLMResponseBytes int
CapToolArgsBytes int
CapToolResultBytes int
}
```

These are consumed at `forge-cli/runtime/runner.go:1933`
(`registerAuditHooks`) for the tool emit path and at the equivalent
LLM emit path. But nothing populates the struct from env vars or
forge.yaml
— a grep across the tree confirms `r.cfg.AuditPayloadCapture`
is only read, never written outside of programmatic configuration.

Result: today the only way to turn capture on in production is to
embed Forge as a Go library and set the field on `RunnerConfig`. A
container deployment can't flip the bit without a code change.
That's the same gap PRs #154 and #156 closed for OTel span content
and guardrail evidence respectively — payload capture missed the
consolidation.

Gap 2 — No redact pass

The capture path is truncate-only:

```go
// forge-cli/runtime/runner.go:1943
if capture.ToolArgs {
fields["args"] = coreruntime.TruncateForAudit(hctx.ToolInput,
coreruntime.CapOrDefault(capture.CapToolArgsBytes))
}
```

No vendor-secret regex scrub. If an LLM puts an API key into a
`cli_execute` command (a real failure mode — the model glues
secrets it remembered from the prompt into shell invocations), the
raw secret lands verbatim on the audit stream.

The same vendor-secret regex set has shipped three times:

All three should share one redact helper.

Proposal

Part 1 — Operator surface

Add env-var pickup mirroring the existing
`AuditExportConfigFromEnv` (`forge-core/runtime/audit_export_config.go:64`)
and `GuardrailAuditConfigFromEnv` (`forge-cli/runtime/guardrails_audit.go:64`)
patterns:

Env var Default Meaning
`FORGE_AUDIT_CAPTURE_TOOL_ARGS` `false` Capture raw tool input on `tool_exec phase=start`
`FORGE_AUDIT_CAPTURE_TOOL_RESULT` `false` Capture raw tool output on `tool_exec phase=end`
`FORGE_AUDIT_CAPTURE_LLM_MESSAGES` `false` Capture chat messages array on `llm_call`
`FORGE_AUDIT_CAPTURE_LLM_RESPONSE` `false` Capture model completion text on `llm_call`
`FORGE_AUDIT_CAPTURE_REDACT` `true` Run vendor-secret regex scrub before emission (see Part 2)
`FORGE_AUDIT_CAPTURE_MAX_BYTES` `16384` Per-field byte cap (overrides field-specific caps when set)

Per-field cap env vars only if needed; the single-knob default
`FORGE_AUDIT_CAPTURE_MAX_BYTES` should be sufficient for most
operators.

forge.yaml mirror:

```yaml
audit:
capture:
tool_args: true
tool_result: true
llm_messages: false
llm_response: false
redact: true # ON by default; only flip OFF if a downstream sink scrubs
max_bytes: 16384
```

Precedence: forge.yaml > env var > default. Same pattern the export
config + guardrail audit config use.

`AuditPayloadCaptureFromEnv()` constructor in
`forge-core/runtime/audit_payload_capture.go`; `RunnerConfig`
populated at runner startup before `registerAuditHooks` runs.

Part 2 — Consolidated redact pass

The vendor-secret regex set is identical across all three
content-capture paths. Lift it into `forge-core/runtime/content_redact.go`
(already exists from #130) and expose:

```go
// forge-core/runtime/content_redact.go
func RedactSecrets(s string) string // already exists from #130
func PrepareCapturedContent(s string, redact bool, maxBytes int) string // new — equivalent to PrepareSpanContent but named for audit
```

`PrepareCapturedContent` is just `redact-then-truncate`:

```go
func PrepareCapturedContent(s string, redact bool, maxBytes int) string {
if redact {
s = RedactSecrets(s)
}
if maxBytes <= 0 {
maxBytes = DefaultPayloadCaptureCapBytes
}
return TruncateForAudit(s, maxBytes)
}
```

`registerAuditHooks` in `runner.go:1925` replaces the four bare
`TruncateForAudit` calls with `PrepareCapturedContent` calls,
threading the `Redact` flag from the resolved config.

`forge-cli/runtime/guardrails_audit.go`'s `prepareEvidence` and
the OTel span path's `PrepareSpanContent` collapse into thin
adapters over `PrepareCapturedContent` so all three pipelines
share one redact + truncate implementation and one set of vendor
regexes.

Schema impact

`tool_exec` event grows two opt-in fields:

```json
{
"event": "tool_exec",
"fields": {
"tool": "cli_execute",
"phase": "start",
"args_size": 142,
"args": "curl -X POST https://api.example.com/v1/...\"
}
}
```

`llm_call` event grows two opt-in fields (`prompt_messages`,
`completion_text`) per the FWS-8 design that was already coded but
never operator-exposed.

Both fields use `omitempty` so deployments without capture flags
keep emitting the pre-capture JSON shape verbatim. The
`AuditSchemaVersion` is not bumped — additive optional fields
are schema-compatible per the documented policy.

Verbosity guidance (docs)

The new docs section should walk operators through the cost:

Posture Per `tool_exec` event
Default (metadata only) ~150-300 bytes
`tool_args + tool_result` on up to ~32 KB
Realistic average for tool-heavy agent 5-15 KB

For an agent doing 1000 invocations/day with 5 tool calls each:

  • Metadata-only: ~1 MB/day
  • Both captures on: 25-80 MB/day

Recommend operators turn capture on selectively:

  • Debug a misbehaving tool: `tool_args + tool_result` for that
    session only, then turn off
  • Compliance evidence: `tool_args` only (the inputs the agent
    produced); `tool_result` rarely needed
  • Long-running production: leave default off unless a specific
    audit need surfaces

Files

File Change
`forge-core/runtime/audit_payload_capture.go` Add `AuditPayloadCaptureFromEnv()`; add `Redact bool` field (default true)
`forge-core/runtime/content_redact.go` Add `PrepareCapturedContent(s, redact, maxBytes)`
`forge-core/types/config.go` Add `AuditConfig.Capture` block to forge.yaml schema
`forge-cli/runtime/runner.go` Populate `r.cfg.AuditPayloadCapture` from forge.yaml > env > default at startup; replace bare `TruncateForAudit` calls with `PrepareCapturedContent`
`forge-cli/runtime/guardrails_audit.go` Collapse `prepareEvidence` into a thin adapter over `PrepareCapturedContent`
OTel span path (from #130) Collapse `PrepareSpanContent` similarly
`docs/security/audit-logging.md` New "Raw payload capture" section with the verbosity table + env var reference
Tests Env-var precedence (forge.yaml > env > default); redact pass scrubs vendor tokens; truncation cap; `tool_exec` schema golden test asserting fields absent by default

Out of scope

  • Per-tool capture filtering — "capture only for `cli_execute`,
    not for `web_search`." Useful but adds config complexity;
    defer until operators ask. For now it's all-or-nothing per
    agent run.
  • Capture rotation / sampling — emit raw payloads on 1% of
    events for production sampling. Defer.
  • Encrypted-at-rest captured payloads — operators who need
    this should route the audit socket to an encrypted sink (the
    FWS-7 sidecar pattern already supports this). Not a Forge-side
    feature.

Verification

  1. Default: `forge run` with no env / yaml capture config. Confirm
    `tool_exec` events have `args_size` / `result_size` only, no
    `args` / `result` keys.
  2. `FORGE_AUDIT_CAPTURE_TOOL_RESULT=true forge run`. Confirm
    `tool_exec phase=end` rows now carry `result` with truncated +
    redacted content.
  3. Seed an `cli_execute` call with an Anthropic-shaped token in
    the args (`sk-ant-...`); confirm `fields.args` contains
    `[REDACTED]` in place of the token.
  4. `FORGE_AUDIT_CAPTURE_REDACT=false` + same scenario; confirm
    the token lands verbatim (escape hatch verified).
  5. Mix env + forge.yaml: env sets `tool_args=true`, forge.yaml
    sets `tool_args=false`; confirm forge.yaml wins.
  6. Verbosity cap: 100 KB tool output with default 16 KiB cap;
    confirm `result` ends with `…[truncated:102400]`.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions