Skip to content

feat(audit): emit guardrail_check on every mask/block/warn with opt-in evidence (closes #155)#156

Merged
initializ-mk merged 2 commits into
mainfrom
feat/issue-155-guardrail-audit-emit
Jun 14, 2026
Merged

feat(audit): emit guardrail_check on every mask/block/warn with opt-in evidence (closes #155)#156
initializ-mk merged 2 commits into
mainfrom
feat/issue-155-guardrail-audit-emit

Conversation

@initializ-mk

Copy link
Copy Markdown
Contributor

Summary

  • The AuditGuardrail constant has been defined since FWS-7 but never emitted. LibraryGuardrailEngine logged redactions to the ops logger and that was it — operators tailing the audit socket / stderr NDJSON saw zero guardrail events. Docs claimed otherwise.
  • This PR wires the audit logger into the engine and emits guardrail_check at every mask / block / warn site (inbound, outbound, tool_output × decision).
  • Adds an opt-in evidence-capture knob so operators who need the offending text (false-positive triage, compliance proof) can flip a flag without changing the default metadata-only posture. Same shape as the OTel content-capture posture in OTel: honor capture_content + redact on span attributes (reuse FWS-8 audit redactor) #130.

Wire diagram

A2A request
  ├─ ctx { correlation_id, task_id, seq, workflow_* }
  ↓
guardrails.CheckInbound(ctx, msg)
  ├─ library InputGate → Result{Decision, Violations, MaskedContent}
  ├─ on Mask/Block: engine.emitGuardrailEvent(ctx, ...)
  │    ├─ build fields {direction, decision, guardrail, category, violation_count, [tool], [evidence]}
  │    ├─ evidence = prepareEvidence(originalText, cfg)   ← only when CaptureEvidence=true
  │    │            (redactSecrets → TruncateForAudit)
  │    └─ auditLogger.EmitFromContext(ctx, AuditEvent{Event: AuditGuardrail, Fields: ...})
  ↓
stderr safety net + (FORGE_AUDIT_SOCKET / FORGE_AUDIT_HTTP_ENDPOINT)

Default event shape (metadata-only)

{
  "ts": "2026-06-14T10:00:00Z",
  "event": "guardrail_check",
  "schema_version": "1.0",
  "seq": 2,
  "correlation_id": "fd111edd27c20101",
  "task_id": "slack-...",
  "fields": {
    "direction": "inbound",
    "decision": "masked",
    "guardrail": "pii",
    "category": "ssn",
    "violation_count": 1
  }
}

With evidence capture on (FORGE_GUARDRAIL_CAPTURE_EVIDENCE=true)

{
  "ts": "...",
  "event": "guardrail_check",
  "fields": {
    "direction": "inbound",
    "decision": "masked",
    "guardrail": "pii",
    "category": "ssn",
    "violation_count": 1,
    "evidence": "verify SSN 123-45-6789 with token [REDACTED] please"
  }
}

The [REDACTED] marker matches FWS-8's marker so audit consumers see one consistent token across both pipelines. The same regex set as the bundled secret rules is used for the defense-in-depth scrub.

Config knobs

Env var Default Meaning
FORGE_GUARDRAIL_CAPTURE_EVIDENCE false Include fields.evidence in the emitted event
FORGE_GUARDRAIL_REDACT true Run vendor-secret regex scrub before emission
FORGE_GUARDRAIL_MAX_BYTES 4096 Soft cap; overage truncated with …[truncated:N]

Behavior matrix

Direction Decision Audit emitted? Result string
inbound Mask yes masked
inbound Block + warn mode yes warned
inbound Block + enforce mode yes (then error returned) blocked
outbound Mask yes masked
outbound Block + warn mode yes warned
outbound Block + enforce mode yes (then error returned) blocked
tool_output Mask yes masked
tool_output Block + warn mode yes warned
tool_output Block + enforce mode yes (then error returned) blocked

Files

  • forge-core/runtime/guardrails.go — interface now takes context.Context
  • forge-core/runtime/guardrails_test.go — updated to match
  • forge-cli/runtime/guardrails_audit.gonewGuardrailAuditConfig, redact pipeline, emitGuardrailEvent helper
  • forge-cli/runtime/guardrails_engine.go — calls emit at all 7 sites, WithAuditLogger for wiring
  • forge-cli/runtime/guardrails_loader.goBuildGuardrailChecker now takes (auditLogger, auditCfg)
  • forge-cli/runtime/guardrails_engine_test.go — 4 new tests: inbound-mask emits, evidence omitted by default, prepareEvidence (table-driven), truncation cap
  • forge-cli/runtime/runner.go — reordered audit-logger / guardrails construction, ctx threaded into all 7 call sites
  • docs/security/guardrails.md — new Audit Events section with full schema + opt-in evidence table
  • docs/security/audit-logging.md — updated guardrail_check row
  • .claude/skills/forge.md — updated audit-event reference row

Test plan

  • go test ./... clean in forge-core and forge-cli
  • golangci-lint run ./... → 0 issues in all three modules
  • gofmt -w applied
  • End-to-end smoke: start socket listener (socat -u UNIX-LISTEN:/tmp/forge-audit.sock,fork -), run agent with FORGE_AUDIT_SOCKET=/tmp/forge-audit.sock, send PII-bearing message, confirm guardrail_check event appears alongside session_*, llm_call, invocation_complete
  • With FORGE_GUARDRAIL_CAPTURE_EVIDENCE=true, confirm fields.evidence is present and a secret-shaped substring in the prompt is replaced with [REDACTED]
  • With FORGE_GUARDRAIL_CAPTURE_EVIDENCE=false (default), confirm fields.evidence is absent

Closes #155

…n evidence capture (closes #155)

The AuditGuardrail constant has been defined since FWS-7 but
nothing ever emitted it — LibraryGuardrailEngine only logged
redactions to the ops logger, so operators tailing the audit
socket / stderr NDJSON saw no guardrail events at all. Docs
claimed otherwise.

This commit:

- Extends GuardrailChecker (forge-core/runtime/guardrails.go)
  with a context.Context parameter so emissions can be routed
  through EmitFromContext and inherit correlation_id, task_id,
  sequence, and workflow-correlation tags from the request.
- Wires *coreruntime.AuditLogger + GuardrailAuditConfig into
  LibraryGuardrailEngine. BuildGuardrailChecker now takes both,
  and runner.Start() reorders construction so the audit logger
  exists before the guardrail engine is built.
- Emits AuditGuardrail at all 7 mask/block/warn sites
  (inbound/outbound/tool_output × {mask, warn, blocked-enforce}).
  Fields shape: direction, decision (masked/warned/blocked),
  guardrail (Violation.Type), category (Violation.Category),
  violation_count, optional tool, optional evidence.
- Adds GuardrailAuditConfig with CaptureEvidence (off by
  default — metadata-only posture matches the existing FWS-8
  audit payload-capture default and the issue #130 OTel
  content-capture default), Redact (on by default, scrubs
  vendor-secret token shapes with the same patterns as the
  #130 work and the FWS-8 [REDACTED] marker), and MaxBytes
  (4 KiB soft cap, truncated via the existing TruncateForAudit
  + …[truncated:N] marker).
- Env knobs: FORGE_GUARDRAIL_CAPTURE_EVIDENCE,
  FORGE_GUARDRAIL_REDACT, FORGE_GUARDRAIL_MAX_BYTES.
- Updates all callers (3× runner.go A2A handlers, AfterToolExec
  hook, NoopGuardrailChecker, tests).
- Docs: docs/security/guardrails.md grows a full Audit Events
  section with the event field table and opt-in evidence
  knobs; docs/security/audit-logging.md row + the forge skill
  audit table row are updated to match.
Previous behavior stamped the raw pre-mask text into
fields.evidence even when the library had just masked PII out of
the prompt — so an inbound SSN ended up plain-text in the audit
stream even though the LLM downstream only saw the masked form.

CheckInbound / CheckOutbound / CheckToolOutput now pass
result.MaskedContent (the post-library-mask payload) for
DecisionMask. Block / warn decisions still emit the original
content because the library never produces a masked variant in
those paths and the operator wants to see what was rejected.

Docs and the EmitsAuditOnInboundMask test are updated to assert
the raw PII MUST NOT appear in evidence on a mask decision.
@initializ-mk initializ-mk merged commit 3779e40 into main Jun 14, 2026
10 checks passed
initializ-mk added a commit that referenced this pull request Jun 15, 2026
Symmetric to the guardrail_check audit emission shipped in
#156 / #160 — every Check* method on LibraryGuardrailEngine opens
a guardrail.<gate> child span and stamps the same gate / decision
/ violation metadata operators see on the audit event.

Span names map to the library's gate vocabulary:
  - guardrail.input        (CheckInbound → InputGate)
  - guardrail.context      (CheckContext → ContextGate)
  - guardrail.tool_call    (CheckToolCall → ToolCallGate)
  - guardrail.output       (CheckOutbound + CheckToolOutput → OutputGate)
  - guardrail.stream       (CheckStream → StreamGate; not auto-wired)

The CheckOutbound case splits per text part — one
guardrail.output span per OutputGate call so the trace tree
mirrors the part-level iteration cleanly.

Attribute keys (new constants in forge-core/observability/attrs.go):
  - forge.guardrail.gate              (Result.Gate)
  - forge.guardrail.decision          (Result.Decision: allow/mask/block/warn)
  - forge.guardrail.type              (first violation's Type)
  - forge.guardrail.category          (first violation's Category)
  - forge.guardrail.violation_count   (len(Result.Violations))
  - forge.guardrail.evidence          (gated by CaptureContent + Redact)
  - forge.tool.name                   (reused from #130; set on tool_call + tool-output paths)

Block decisions stamp OTel Error status with the violation summary
as the status description — operators see red bars in the trace UI
without custom attribute queries.

forge.guardrail.evidence follows the #130 + #156 content rule
exactly: default off; with CaptureContent on, the mask path
emits post-mask content (matches what the LLM actually saw) and
the block/warn paths emit original content. PrepareSpanContent
runs the same redact-then-truncate pipeline used for
gen_ai.input.messages and forge.tool.args, so the four content
streams share one consistent shape.

Wiring:
  - LibraryGuardrailEngine grows a tracingCfg field + WithTracing
    setter; BuildGuardrailChecker gains a TracingConfig parameter
    and calls WithTracing on every constructed engine.
  - runner.Start resolves TracingConfig early (it's a pure config
    resolution — no I/O) so the guardrail engine sees it before
    NewTracerProvider runs; the later tracing block reuses the
    resolved value.
  - When tracing is disabled, the noop tracer short-circuits;
    spans are not produced at all. CaptureContent only controls
    the evidence attribute — the span itself is always opened
    (it's cheap when tracing is off).

Tests (forge-cli/runtime/guardrails_tracing_test.go):
  - guardrail.input span lands with gate/decision/violation_count
    attributes; evidence ABSENT when CaptureContent=false
  - evidence PRESENT but raw PII absent when CaptureContent=true
    (post-mask rule)
  - guardrail.tool_call carries forge.tool.name
  - guardrail.output for CheckOutbound has NO tool attribute
    (distinguishes "model reply to user" from tool-result OutputGate
    fires)
  - guardrail.context + guardrail.stream spans land
  - noop-tracer path: no spans recorded

Docs: docs/core-concepts/observability-tracing.md gains a
"Guardrail spans" section under "Span content capture" listing
span names, nesting, attribute reference, and the content-capture
parity note.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Guardrail mask/block events never reach audit pipeline (guardrail_check is dead code)

1 participant