feat(audit): emit guardrail_check on every mask/block/warn with opt-in evidence (closes #155)#156
Merged
Merged
Conversation
…n evidence capture (closes #155) The AuditGuardrail constant has been defined since FWS-7 but nothing ever emitted it — LibraryGuardrailEngine only logged redactions to the ops logger, so operators tailing the audit socket / stderr NDJSON saw no guardrail events at all. Docs claimed otherwise. This commit: - Extends GuardrailChecker (forge-core/runtime/guardrails.go) with a context.Context parameter so emissions can be routed through EmitFromContext and inherit correlation_id, task_id, sequence, and workflow-correlation tags from the request. - Wires *coreruntime.AuditLogger + GuardrailAuditConfig into LibraryGuardrailEngine. BuildGuardrailChecker now takes both, and runner.Start() reorders construction so the audit logger exists before the guardrail engine is built. - Emits AuditGuardrail at all 7 mask/block/warn sites (inbound/outbound/tool_output × {mask, warn, blocked-enforce}). Fields shape: direction, decision (masked/warned/blocked), guardrail (Violation.Type), category (Violation.Category), violation_count, optional tool, optional evidence. - Adds GuardrailAuditConfig with CaptureEvidence (off by default — metadata-only posture matches the existing FWS-8 audit payload-capture default and the issue #130 OTel content-capture default), Redact (on by default, scrubs vendor-secret token shapes with the same patterns as the #130 work and the FWS-8 [REDACTED] marker), and MaxBytes (4 KiB soft cap, truncated via the existing TruncateForAudit + …[truncated:N] marker). - Env knobs: FORGE_GUARDRAIL_CAPTURE_EVIDENCE, FORGE_GUARDRAIL_REDACT, FORGE_GUARDRAIL_MAX_BYTES. - Updates all callers (3× runner.go A2A handlers, AfterToolExec hook, NoopGuardrailChecker, tests). - Docs: docs/security/guardrails.md grows a full Audit Events section with the event field table and opt-in evidence knobs; docs/security/audit-logging.md row + the forge skill audit table row are updated to match.
Previous behavior stamped the raw pre-mask text into fields.evidence even when the library had just masked PII out of the prompt — so an inbound SSN ended up plain-text in the audit stream even though the LLM downstream only saw the masked form. CheckInbound / CheckOutbound / CheckToolOutput now pass result.MaskedContent (the post-library-mask payload) for DecisionMask. Block / warn decisions still emit the original content because the library never produces a masked variant in those paths and the operator wants to see what was rejected. Docs and the EmitsAuditOnInboundMask test are updated to assert the raw PII MUST NOT appear in evidence on a mask decision.
This was referenced Jun 14, 2026
Closed
Merged
initializ-mk
added a commit
that referenced
this pull request
Jun 15, 2026
Symmetric to the guardrail_check audit emission shipped in #156 / #160 — every Check* method on LibraryGuardrailEngine opens a guardrail.<gate> child span and stamps the same gate / decision / violation metadata operators see on the audit event. Span names map to the library's gate vocabulary: - guardrail.input (CheckInbound → InputGate) - guardrail.context (CheckContext → ContextGate) - guardrail.tool_call (CheckToolCall → ToolCallGate) - guardrail.output (CheckOutbound + CheckToolOutput → OutputGate) - guardrail.stream (CheckStream → StreamGate; not auto-wired) The CheckOutbound case splits per text part — one guardrail.output span per OutputGate call so the trace tree mirrors the part-level iteration cleanly. Attribute keys (new constants in forge-core/observability/attrs.go): - forge.guardrail.gate (Result.Gate) - forge.guardrail.decision (Result.Decision: allow/mask/block/warn) - forge.guardrail.type (first violation's Type) - forge.guardrail.category (first violation's Category) - forge.guardrail.violation_count (len(Result.Violations)) - forge.guardrail.evidence (gated by CaptureContent + Redact) - forge.tool.name (reused from #130; set on tool_call + tool-output paths) Block decisions stamp OTel Error status with the violation summary as the status description — operators see red bars in the trace UI without custom attribute queries. forge.guardrail.evidence follows the #130 + #156 content rule exactly: default off; with CaptureContent on, the mask path emits post-mask content (matches what the LLM actually saw) and the block/warn paths emit original content. PrepareSpanContent runs the same redact-then-truncate pipeline used for gen_ai.input.messages and forge.tool.args, so the four content streams share one consistent shape. Wiring: - LibraryGuardrailEngine grows a tracingCfg field + WithTracing setter; BuildGuardrailChecker gains a TracingConfig parameter and calls WithTracing on every constructed engine. - runner.Start resolves TracingConfig early (it's a pure config resolution — no I/O) so the guardrail engine sees it before NewTracerProvider runs; the later tracing block reuses the resolved value. - When tracing is disabled, the noop tracer short-circuits; spans are not produced at all. CaptureContent only controls the evidence attribute — the span itself is always opened (it's cheap when tracing is off). Tests (forge-cli/runtime/guardrails_tracing_test.go): - guardrail.input span lands with gate/decision/violation_count attributes; evidence ABSENT when CaptureContent=false - evidence PRESENT but raw PII absent when CaptureContent=true (post-mask rule) - guardrail.tool_call carries forge.tool.name - guardrail.output for CheckOutbound has NO tool attribute (distinguishes "model reply to user" from tool-result OutputGate fires) - guardrail.context + guardrail.stream spans land - noop-tracer path: no spans recorded Docs: docs/core-concepts/observability-tracing.md gains a "Guardrail spans" section under "Span content capture" listing span names, nesting, attribute reference, and the content-capture parity note.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
AuditGuardrailconstant has been defined since FWS-7 but never emitted.LibraryGuardrailEnginelogged redactions to the ops logger and that was it — operators tailing the audit socket / stderr NDJSON saw zero guardrail events. Docs claimed otherwise.guardrail_checkat every mask / block / warn site (inbound, outbound, tool_output × decision).Wire diagram
Default event shape (metadata-only)
{ "ts": "2026-06-14T10:00:00Z", "event": "guardrail_check", "schema_version": "1.0", "seq": 2, "correlation_id": "fd111edd27c20101", "task_id": "slack-...", "fields": { "direction": "inbound", "decision": "masked", "guardrail": "pii", "category": "ssn", "violation_count": 1 } }With evidence capture on (
FORGE_GUARDRAIL_CAPTURE_EVIDENCE=true){ "ts": "...", "event": "guardrail_check", "fields": { "direction": "inbound", "decision": "masked", "guardrail": "pii", "category": "ssn", "violation_count": 1, "evidence": "verify SSN 123-45-6789 with token [REDACTED] please" } }The
[REDACTED]marker matches FWS-8's marker so audit consumers see one consistent token across both pipelines. The same regex set as the bundled secret rules is used for the defense-in-depth scrub.Config knobs
FORGE_GUARDRAIL_CAPTURE_EVIDENCEfalsefields.evidencein the emitted eventFORGE_GUARDRAIL_REDACTtrueFORGE_GUARDRAIL_MAX_BYTES4096…[truncated:N]Behavior matrix
maskedwarnedblockedmaskedwarnedblockedmaskedwarnedblockedFiles
forge-core/runtime/guardrails.go— interface now takescontext.Contextforge-core/runtime/guardrails_test.go— updated to matchforge-cli/runtime/guardrails_audit.go— new —GuardrailAuditConfig, redact pipeline,emitGuardrailEventhelperforge-cli/runtime/guardrails_engine.go— calls emit at all 7 sites,WithAuditLoggerfor wiringforge-cli/runtime/guardrails_loader.go—BuildGuardrailCheckernow takes(auditLogger, auditCfg)forge-cli/runtime/guardrails_engine_test.go— 4 new tests: inbound-mask emits, evidence omitted by default,prepareEvidence(table-driven), truncation capforge-cli/runtime/runner.go— reordered audit-logger / guardrails construction, ctx threaded into all 7 call sitesdocs/security/guardrails.md— new Audit Events section with full schema + opt-in evidence tabledocs/security/audit-logging.md— updatedguardrail_checkrow.claude/skills/forge.md— updated audit-event reference rowTest plan
go test ./...clean in forge-core and forge-cligolangci-lint run ./...→ 0 issues in all three modulesgofmt -wappliedsocat -u UNIX-LISTEN:/tmp/forge-audit.sock,fork -), run agent withFORGE_AUDIT_SOCKET=/tmp/forge-audit.sock, send PII-bearing message, confirmguardrail_checkevent appears alongsidesession_*,llm_call,invocation_completeFORGE_GUARDRAIL_CAPTURE_EVIDENCE=true, confirmfields.evidenceis present and a secret-shaped substring in the prompt is replaced with[REDACTED]FORGE_GUARDRAIL_CAPTURE_EVIDENCE=false(default), confirmfields.evidenceis absentCloses #155