feat(guardrails): wire all 5 library gates + drop direction in favor of gate (closes #159)#160
Merged
Merged
Conversation
…ate (closes #159) Forge previously invoked only InputGate and OutputGate of the five gates the guardrails library supports. The other three (ToolCallGate, ContextGate, StreamGate) were defined in the library and even advertised via GateConfig.ToolCallGate=true in the default StructuredGuardrails, but the agent runtime never called them — silent no-ops. This commit wires all five and unifies the audit-event shape on the library's own gate vocabulary. Interface (forge-core/runtime/guardrails.go): - CheckInbound(ctx, msg) error — InputGate - CheckOutbound(ctx, msg) error — OutputGate - CheckToolCall(ctx, toolName, args) (str, err) — ToolCallGate (new) - CheckToolOutput(ctx, toolName, text) (str, err) — OutputGate - CheckContext(ctx, content) (str, err) — ContextGate (new) - CheckStream(ctx, chunk) (str, err) — StreamGate (new) LibraryGuardrailEngine implements all 5 by calling the library's matching gate. Each emit pulls the gate type from Result.Gate directly — single source of truth. Wiring (forge-cli/runtime/runner.go registerGuardrailHooks): - BeforeLLMCall hook → CheckContext over every system-role message in HookContext.Messages. Closest thing Forge has to "retrieved context" today; future memory / RAG work can call CheckContext directly from the recall path for a finer-grained seam. - BeforeToolExec hook → CheckToolCall over hctx.ToolInput. Masks args in-place; blocks abort the tool exec the same way the AfterToolExec gate does. - AfterToolExec hook → CheckToolOutput (existing). - CheckInbound / CheckOutbound continue to be called directly from the A2A handlers (outside the agent loop's hook surface because the loop only sees ChatMessages, not A2A envelopes). - CheckStream is NOT auto-wired: Forge's ExecuteStream is a buffered wrapper around non-streaming Execute. The method is exposed for direct callers of llm.Client.ChatStream and for future loop work that adds a real per-chunk seam. Event-shape change: - The fields.direction field (added in #155, unused by any consumer per #159 conversation) is REPLACED by fields.gate, sourced from Result.Gate. - gate values are exactly the five library constants: input / context / tool_call / output / stream. - fields.tool is set on tool_call AND on output events for tool return text — so consumers can distinguish OutputGate fires on tool results from OutputGate fires on the model's reply to the user without a synthetic direction field. Pre-#159 agents emitted direction-only. Consumers that need to support both vintages map old direction values to gate via the table in docs/security/guardrails.md (inbound→input, outbound→output, tool_output→output+tool). Tests: - NoopGuardrailChecker pass-through for all 5 gates. - Mask emit pins gate=input and asserts direction MUST NOT appear in the JSON. - New TestLibraryGuardrailEngine_EmitsAuditOnToolCallMask drives the ToolCallGate path. - New empty-input short-circuit test for the three new methods. Docs: - docs/security/guardrails.md — gate field reference, the five- gate call-site table, the pre-#159 migration block. - docs/security/audit-logging.md — guardrail_check row updated. - .claude/skills/forge.md — AuditGuardrail entry updated.
7 tasks
initializ-mk
added a commit
that referenced
this pull request
Jun 15, 2026
Symmetric to the guardrail_check audit emission shipped in #156 / #160 — every Check* method on LibraryGuardrailEngine opens a guardrail.<gate> child span and stamps the same gate / decision / violation metadata operators see on the audit event. Span names map to the library's gate vocabulary: - guardrail.input (CheckInbound → InputGate) - guardrail.context (CheckContext → ContextGate) - guardrail.tool_call (CheckToolCall → ToolCallGate) - guardrail.output (CheckOutbound + CheckToolOutput → OutputGate) - guardrail.stream (CheckStream → StreamGate; not auto-wired) The CheckOutbound case splits per text part — one guardrail.output span per OutputGate call so the trace tree mirrors the part-level iteration cleanly. Attribute keys (new constants in forge-core/observability/attrs.go): - forge.guardrail.gate (Result.Gate) - forge.guardrail.decision (Result.Decision: allow/mask/block/warn) - forge.guardrail.type (first violation's Type) - forge.guardrail.category (first violation's Category) - forge.guardrail.violation_count (len(Result.Violations)) - forge.guardrail.evidence (gated by CaptureContent + Redact) - forge.tool.name (reused from #130; set on tool_call + tool-output paths) Block decisions stamp OTel Error status with the violation summary as the status description — operators see red bars in the trace UI without custom attribute queries. forge.guardrail.evidence follows the #130 + #156 content rule exactly: default off; with CaptureContent on, the mask path emits post-mask content (matches what the LLM actually saw) and the block/warn paths emit original content. PrepareSpanContent runs the same redact-then-truncate pipeline used for gen_ai.input.messages and forge.tool.args, so the four content streams share one consistent shape. Wiring: - LibraryGuardrailEngine grows a tracingCfg field + WithTracing setter; BuildGuardrailChecker gains a TracingConfig parameter and calls WithTracing on every constructed engine. - runner.Start resolves TracingConfig early (it's a pure config resolution — no I/O) so the guardrail engine sees it before NewTracerProvider runs; the later tracing block reuses the resolved value. - When tracing is disabled, the noop tracer short-circuits; spans are not produced at all. CaptureContent only controls the evidence attribute — the span itself is always opened (it's cheap when tracing is off). Tests (forge-cli/runtime/guardrails_tracing_test.go): - guardrail.input span lands with gate/decision/violation_count attributes; evidence ABSENT when CaptureContent=false - evidence PRESENT but raw PII absent when CaptureContent=true (post-mask rule) - guardrail.tool_call carries forge.tool.name - guardrail.output for CheckOutbound has NO tool attribute (distinguishes "model reply to user" from tool-result OutputGate fires) - guardrail.context + guardrail.stream spans land - noop-tracer path: no spans recorded Docs: docs/core-concepts/observability-tracing.md gains a "Guardrail spans" section under "Span content capture" listing span names, nesting, attribute reference, and the content-capture parity note.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Forge previously invoked only `InputGate` and `OutputGate` of the five gates the guardrails library supports. `ToolCallGate`, `ContextGate`, and `StreamGate` were defined in the library — `GateConfig.ToolCallGate: true` was even advertised in `DefaultStructuredGuardrails` — but the agent runtime never invoked them. Silent no-ops.
This PR wires all five and unifies the audit-event shape on the library's gate vocabulary.
Interface (
forge-core/runtime/guardrails.go)CheckInbound(ctx, msg)InputGateCheckOutbound(ctx, msg)OutputGateCheckToolCall(ctx, toolName, args)ToolCallGate(new)CheckToolOutput(ctx, toolName, text)OutputGate(with tool metadata)CheckContext(ctx, content)ContextGate(new)CheckStream(ctx, chunk)StreamGate(new)LibraryGuardrailEngineimplements all five by calling the matching library method. `NoopGuardrailChecker` pass-through covers all five.Wiring (
registerGuardrailHooksinforge-cli/runtime/runner.go)`CheckInbound` / `CheckOutbound` continue to be called directly from the A2A handlers — they sit outside the agent loop's hook surface because the loop only sees `ChatMessages`, not A2A envelopes.
`CheckStream` is not auto-wired — Forge's `ExecuteStream` (`forge-core/runtime/loop.go:837`) is a buffered wrapper around non-streaming `Execute`. The method is exposed for direct callers of `llm.Client.ChatStream` and for future loop work that adds a real per-chunk seam.
Why ContextGate at `BeforeLLMCall`
Forge has no separate "retrieved-context-being-injected" interception point today (memory recall / RAG result merging happens at message-assembly time). Scanning `system`-role messages at `BeforeLLMCall` is the closest defensible seam: dynamic system content (RAG output, templated context) usually lives in those messages, and re-scanning per iteration is cheap when no rule matches. Future memory work can call `CheckContext` directly from the recall path for a finer-grained seam.
Event-shape change
The `fields.direction` key (introduced in #155, not consumed by any reader per the discussion that led to #159) is replaced by `fields.gate`, sourced from `Result.Gate` — the library's own classification.
New event shape
```json
{
"event": "guardrail_check",
"fields": {
"gate": "tool_call",
"decision": "masked",
"guardrail": "pii",
"category": "email",
"violation_count": 1,
"tool": "send_email"
}
}
```
Migration from pre-#159 agents
Consumers that need to read both vintages map old direction values to gate:
Documented in `docs/security/guardrails.md#audit-events`.
Files
Test plan
Schema impact
`fields.gate` is additive. `fields.direction` is dropped. Per the discussion that led to #159, no consumer reads `direction` today — the pre-existing PR that introduced it (#155 / #156) shipped the field without any downstream wiring. The unification PR on the consumer side will implement the `gate ?? deriveFromDirection(direction)` fallback for historical events emitted from pre-#159 agents.
Closes #159