feat(audit): emit guardrail_check on every mask/block/warn with opt-in evidence (closes #155) by initializ-mk · Pull Request #156 · initializ/forge

initializ-mk · 2026-06-14T04:47:17Z

Summary

The AuditGuardrail constant has been defined since FWS-7 but never emitted. LibraryGuardrailEngine logged redactions to the ops logger and that was it — operators tailing the audit socket / stderr NDJSON saw zero guardrail events. Docs claimed otherwise.
This PR wires the audit logger into the engine and emits guardrail_check at every mask / block / warn site (inbound, outbound, tool_output × decision).
Adds an opt-in evidence-capture knob so operators who need the offending text (false-positive triage, compliance proof) can flip a flag without changing the default metadata-only posture. Same shape as the OTel content-capture posture in OTel: honor capture_content + redact on span attributes (reuse FWS-8 audit redactor) #130.

Wire diagram

A2A request
  ├─ ctx { correlation_id, task_id, seq, workflow_* }
  ↓
guardrails.CheckInbound(ctx, msg)
  ├─ library InputGate → Result{Decision, Violations, MaskedContent}
  ├─ on Mask/Block: engine.emitGuardrailEvent(ctx, ...)
  │    ├─ build fields {direction, decision, guardrail, category, violation_count, [tool], [evidence]}
  │    ├─ evidence = prepareEvidence(originalText, cfg)   ← only when CaptureEvidence=true
  │    │            (redactSecrets → TruncateForAudit)
  │    └─ auditLogger.EmitFromContext(ctx, AuditEvent{Event: AuditGuardrail, Fields: ...})
  ↓
stderr safety net + (FORGE_AUDIT_SOCKET / FORGE_AUDIT_HTTP_ENDPOINT)

Default event shape (metadata-only)

{
  "ts": "2026-06-14T10:00:00Z",
  "event": "guardrail_check",
  "schema_version": "1.0",
  "seq": 2,
  "correlation_id": "fd111edd27c20101",
  "task_id": "slack-...",
  "fields": {
    "direction": "inbound",
    "decision": "masked",
    "guardrail": "pii",
    "category": "ssn",
    "violation_count": 1
  }
}

With evidence capture on (`FORGE_GUARDRAIL_CAPTURE_EVIDENCE=true`)

{
  "ts": "...",
  "event": "guardrail_check",
  "fields": {
    "direction": "inbound",
    "decision": "masked",
    "guardrail": "pii",
    "category": "ssn",
    "violation_count": 1,
    "evidence": "verify SSN 123-45-6789 with token [REDACTED] please"
  }
}

The [REDACTED] marker matches FWS-8's marker so audit consumers see one consistent token across both pipelines. The same regex set as the bundled secret rules is used for the defense-in-depth scrub.

Config knobs

Env var	Default	Meaning
`FORGE_GUARDRAIL_CAPTURE_EVIDENCE`	`false`	Include `fields.evidence` in the emitted event
`FORGE_GUARDRAIL_REDACT`	`true`	Run vendor-secret regex scrub before emission
`FORGE_GUARDRAIL_MAX_BYTES`	`4096`	Soft cap; overage truncated with `…[truncated:N]`

Behavior matrix

Direction	Decision	Audit emitted?	Result string
inbound	Mask	yes	`masked`
inbound	Block + warn mode	yes	`warned`
inbound	Block + enforce mode	yes (then error returned)	`blocked`
outbound	Mask	yes	`masked`
outbound	Block + warn mode	yes	`warned`
outbound	Block + enforce mode	yes (then error returned)	`blocked`
tool_output	Mask	yes	`masked`
tool_output	Block + warn mode	yes	`warned`
tool_output	Block + enforce mode	yes (then error returned)	`blocked`

Files

forge-core/runtime/guardrails.go — interface now takes context.Context
forge-core/runtime/guardrails_test.go — updated to match
forge-cli/runtime/guardrails_audit.go — new — GuardrailAuditConfig, redact pipeline, emitGuardrailEvent helper
forge-cli/runtime/guardrails_engine.go — calls emit at all 7 sites, WithAuditLogger for wiring
forge-cli/runtime/guardrails_loader.go — BuildGuardrailChecker now takes (auditLogger, auditCfg)
forge-cli/runtime/guardrails_engine_test.go — 4 new tests: inbound-mask emits, evidence omitted by default, prepareEvidence (table-driven), truncation cap
forge-cli/runtime/runner.go — reordered audit-logger / guardrails construction, ctx threaded into all 7 call sites
docs/security/guardrails.md — new Audit Events section with full schema + opt-in evidence table
docs/security/audit-logging.md — updated guardrail_check row
.claude/skills/forge.md — updated audit-event reference row

Test plan

go test ./... clean in forge-core and forge-cli
golangci-lint run ./... → 0 issues in all three modules
gofmt -w applied
End-to-end smoke: start socket listener (socat -u UNIX-LISTEN:/tmp/forge-audit.sock,fork -), run agent with FORGE_AUDIT_SOCKET=/tmp/forge-audit.sock, send PII-bearing message, confirm guardrail_check event appears alongside session_*, llm_call, invocation_complete
With FORGE_GUARDRAIL_CAPTURE_EVIDENCE=true, confirm fields.evidence is present and a secret-shaped substring in the prompt is replaced with [REDACTED]
With FORGE_GUARDRAIL_CAPTURE_EVIDENCE=false (default), confirm fields.evidence is absent

Closes #155

…n evidence capture (closes #155) The AuditGuardrail constant has been defined since FWS-7 but nothing ever emitted it — LibraryGuardrailEngine only logged redactions to the ops logger, so operators tailing the audit socket / stderr NDJSON saw no guardrail events at all. Docs claimed otherwise. This commit: - Extends GuardrailChecker (forge-core/runtime/guardrails.go) with a context.Context parameter so emissions can be routed through EmitFromContext and inherit correlation_id, task_id, sequence, and workflow-correlation tags from the request. - Wires *coreruntime.AuditLogger + GuardrailAuditConfig into LibraryGuardrailEngine. BuildGuardrailChecker now takes both, and runner.Start() reorders construction so the audit logger exists before the guardrail engine is built. - Emits AuditGuardrail at all 7 mask/block/warn sites (inbound/outbound/tool_output × {mask, warn, blocked-enforce}). Fields shape: direction, decision (masked/warned/blocked), guardrail (Violation.Type), category (Violation.Category), violation_count, optional tool, optional evidence. - Adds GuardrailAuditConfig with CaptureEvidence (off by default — metadata-only posture matches the existing FWS-8 audit payload-capture default and the issue #130 OTel content-capture default), Redact (on by default, scrubs vendor-secret token shapes with the same patterns as the #130 work and the FWS-8 [REDACTED] marker), and MaxBytes (4 KiB soft cap, truncated via the existing TruncateForAudit + …[truncated:N] marker). - Env knobs: FORGE_GUARDRAIL_CAPTURE_EVIDENCE, FORGE_GUARDRAIL_REDACT, FORGE_GUARDRAIL_MAX_BYTES. - Updates all callers (3× runner.go A2A handlers, AfterToolExec hook, NoopGuardrailChecker, tests). - Docs: docs/security/guardrails.md grows a full Audit Events section with the event field table and opt-in evidence knobs; docs/security/audit-logging.md row + the forge skill audit table row are updated to match.

Previous behavior stamped the raw pre-mask text into fields.evidence even when the library had just masked PII out of the prompt — so an inbound SSN ended up plain-text in the audit stream even though the LLM downstream only saw the masked form. CheckInbound / CheckOutbound / CheckToolOutput now pass result.MaskedContent (the post-library-mask payload) for DecisionMask. Block / warn decisions still emit the original content because the library never produces a masked variant in those paths and the operator wants to see what was rejected. Docs and the EmitsAuditOnInboundMask test are updated to assert the raw PII MUST NOT appear in evidence on a mask decision.

Symmetric to the guardrail_check audit emission shipped in #156 / #160 — every Check* method on LibraryGuardrailEngine opens a guardrail.<gate> child span and stamps the same gate / decision / violation metadata operators see on the audit event. Span names map to the library's gate vocabulary: - guardrail.input (CheckInbound → InputGate) - guardrail.context (CheckContext → ContextGate) - guardrail.tool_call (CheckToolCall → ToolCallGate) - guardrail.output (CheckOutbound + CheckToolOutput → OutputGate) - guardrail.stream (CheckStream → StreamGate; not auto-wired) The CheckOutbound case splits per text part — one guardrail.output span per OutputGate call so the trace tree mirrors the part-level iteration cleanly. Attribute keys (new constants in forge-core/observability/attrs.go): - forge.guardrail.gate (Result.Gate) - forge.guardrail.decision (Result.Decision: allow/mask/block/warn) - forge.guardrail.type (first violation's Type) - forge.guardrail.category (first violation's Category) - forge.guardrail.violation_count (len(Result.Violations)) - forge.guardrail.evidence (gated by CaptureContent + Redact) - forge.tool.name (reused from #130; set on tool_call + tool-output paths) Block decisions stamp OTel Error status with the violation summary as the status description — operators see red bars in the trace UI without custom attribute queries. forge.guardrail.evidence follows the #130 + #156 content rule exactly: default off; with CaptureContent on, the mask path emits post-mask content (matches what the LLM actually saw) and the block/warn paths emit original content. PrepareSpanContent runs the same redact-then-truncate pipeline used for gen_ai.input.messages and forge.tool.args, so the four content streams share one consistent shape. Wiring: - LibraryGuardrailEngine grows a tracingCfg field + WithTracing setter; BuildGuardrailChecker gains a TracingConfig parameter and calls WithTracing on every constructed engine. - runner.Start resolves TracingConfig early (it's a pure config resolution — no I/O) so the guardrail engine sees it before NewTracerProvider runs; the later tracing block reuses the resolved value. - When tracing is disabled, the noop tracer short-circuits; spans are not produced at all. CaptureContent only controls the evidence attribute — the span itself is always opened (it's cheap when tracing is off). Tests (forge-cli/runtime/guardrails_tracing_test.go): - guardrail.input span lands with gate/decision/violation_count attributes; evidence ABSENT when CaptureContent=false - evidence PRESENT but raw PII absent when CaptureContent=true (post-mask rule) - guardrail.tool_call carries forge.tool.name - guardrail.output for CheckOutbound has NO tool attribute (distinguishes "model reply to user" from tool-result OutputGate fires) - guardrail.context + guardrail.stream spans land - noop-tracer path: no spans recorded Docs: docs/core-concepts/observability-tracing.md gains a "Guardrail spans" section under "Span content capture" listing span names, nesting, attribute reference, and the content-capture parity note.

initializ-mk added 2 commits June 14, 2026 00:46

initializ-mk merged commit 3779e40 into main Jun 14, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(audit): emit guardrail_check on every mask/block/warn with opt-in evidence (closes #155)#156

feat(audit): emit guardrail_check on every mask/block/warn with opt-in evidence (closes #155)#156
initializ-mk merged 2 commits into
mainfrom
feat/issue-155-guardrail-audit-emit

initializ-mk commented Jun 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

initializ-mk commented Jun 14, 2026

Summary

Wire diagram

Default event shape (metadata-only)

With evidence capture on (FORGE_GUARDRAIL_CAPTURE_EVIDENCE=true)

Config knobs

Behavior matrix

Files

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

With evidence capture on (`FORGE_GUARDRAIL_CAPTURE_EVIDENCE=true`)