feat(guardrails): wire all 5 library gates + drop direction in favor of gate (closes #159) by initializ-mk · Pull Request #160 · initializ/forge

initializ-mk · 2026-06-14T16:26:49Z

Summary

Forge previously invoked only `InputGate` and `OutputGate` of the five gates the guardrails library supports. `ToolCallGate`, `ContextGate`, and `StreamGate` were defined in the library — `GateConfig.ToolCallGate: true` was even advertised in `DefaultStructuredGuardrails` — but the agent runtime never invoked them. Silent no-ops.

This PR wires all five and unifies the audit-event shape on the library's gate vocabulary.

Interface (`forge-core/runtime/guardrails.go`)

Method	Library gate
`CheckInbound(ctx, msg)`	`InputGate`
`CheckOutbound(ctx, msg)`	`OutputGate`
`CheckToolCall(ctx, toolName, args)`	`ToolCallGate` (new)
`CheckToolOutput(ctx, toolName, text)`	`OutputGate` (with tool metadata)
`CheckContext(ctx, content)`	`ContextGate` (new)
`CheckStream(ctx, chunk)`	`StreamGate` (new)

LibraryGuardrailEngine implements all five by calling the matching library method. `NoopGuardrailChecker` pass-through covers all five.

Wiring (`registerGuardrailHooks` in `forge-cli/runtime/runner.go`)

Hook	New gate call
`BeforeLLMCall`	`CheckContext` over every `system`-role message in `HookContext.Messages`
`BeforeToolExec`	`CheckToolCall` over `hctx.ToolInput`; masks in-place; blocks abort the tool exec
`AfterToolExec`	`CheckToolOutput` (unchanged)

`CheckInbound` / `CheckOutbound` continue to be called directly from the A2A handlers — they sit outside the agent loop's hook surface because the loop only sees `ChatMessages`, not A2A envelopes.

`CheckStream` is not auto-wired — Forge's `ExecuteStream` (`forge-core/runtime/loop.go:837`) is a buffered wrapper around non-streaming `Execute`. The method is exposed for direct callers of `llm.Client.ChatStream` and for future loop work that adds a real per-chunk seam.

Why ContextGate at `BeforeLLMCall`

Forge has no separate "retrieved-context-being-injected" interception point today (memory recall / RAG result merging happens at message-assembly time). Scanning `system`-role messages at `BeforeLLMCall` is the closest defensible seam: dynamic system content (RAG output, templated context) usually lives in those messages, and re-scanning per iteration is cheap when no rule matches. Future memory work can call `CheckContext` directly from the recall path for a finer-grained seam.

Event-shape change

The `fields.direction` key (introduced in #155, not consumed by any reader per the discussion that led to #159) is replaced by `fields.gate`, sourced from `Result.Gate` — the library's own classification.

`gate`	Path
`input`	InputGate fire on user message
`context`	ContextGate fire on system-role message
`tool_call`	ToolCallGate fire on tool args
`output`	OutputGate fire; `fields.tool` set when the fire was on a tool's return text vs the model's reply to the user
`stream`	StreamGate fire (no auto-wire today)

New event shape

```json
{
"event": "guardrail_check",
"fields": {
"gate": "tool_call",
"decision": "masked",
"guardrail": "pii",
"category": "email",
"violation_count": 1,
"tool": "send_email"
}
}
```

Migration from pre-#159 agents

Consumers that need to read both vintages map old direction values to gate:

Old `direction`	Derived `gate`
`inbound`	`input`
`outbound`	`output` (no `tool`)
`tool_output`	`output` (with `tool` set)

Documented in `docs/security/guardrails.md#audit-events`.

Files

File	Change
`forge-core/runtime/guardrails.go`	Interface gains `CheckToolCall` / `CheckContext` / `CheckStream`; doc-comment updated to explain the gate vocabulary
`forge-core/runtime/guardrails_test.go`	Noop pass-through asserts all 5
`forge-cli/runtime/guardrails_engine.go`	LibraryGuardrailEngine implements the three new methods; existing five emit sites no longer pass a `direction` string
`forge-cli/runtime/guardrails_audit.go`	`emitGuardrailEvent` signature drops `direction`; stamps `fields.gate` from `res.Gate`
`forge-cli/runtime/runner.go`	`registerGuardrailHooks` now registers BeforeLLMCall + BeforeToolExec alongside AfterToolExec
`forge-cli/runtime/guardrails_engine_test.go`	Existing mask-emit test asserts `gate=input` and explicit absence of `direction`; new tests for ToolCallGate emit and empty-input short-circuit on all three new methods
`docs/security/guardrails.md`	Updated field-reference + gate-call-site tables + migration block
`docs/security/audit-logging.md`	`guardrail_check` row updated
`.claude/skills/forge.md`	AuditGuardrail entry updated

Test plan

`go test ./...` clean in forge-core and forge-cli
`golangci-lint run ./...` → 0 issues
`gofmt -w` applied
End-to-end smoke: deploy with `FORGE_AUDIT_SOCKET` set, run a task that triggers a PII-bearing tool args (e.g. an email-sending skill with user-supplied recipient), confirm a `guardrail_check` row with `gate=tool_call` and `tool=` appears on the socket before the tool runs.
Confirm a tool with PII in its output still emits `gate=output` + `tool=` (unchanged shape from existing test coverage).
Confirm an inbound user message with PII emits `gate=input` (no `direction` key).

Schema impact

`fields.gate` is additive. `fields.direction` is dropped. Per the discussion that led to #159, no consumer reads `direction` today — the pre-existing PR that introduced it (#155 / #156) shipped the field without any downstream wiring. The unification PR on the consumer side will implement the `gate ?? deriveFromDirection(direction)` fallback for historical events emitted from pre-#159 agents.

Closes #159

…ate (closes #159) Forge previously invoked only InputGate and OutputGate of the five gates the guardrails library supports. The other three (ToolCallGate, ContextGate, StreamGate) were defined in the library and even advertised via GateConfig.ToolCallGate=true in the default StructuredGuardrails, but the agent runtime never called them — silent no-ops. This commit wires all five and unifies the audit-event shape on the library's own gate vocabulary. Interface (forge-core/runtime/guardrails.go): - CheckInbound(ctx, msg) error — InputGate - CheckOutbound(ctx, msg) error — OutputGate - CheckToolCall(ctx, toolName, args) (str, err) — ToolCallGate (new) - CheckToolOutput(ctx, toolName, text) (str, err) — OutputGate - CheckContext(ctx, content) (str, err) — ContextGate (new) - CheckStream(ctx, chunk) (str, err) — StreamGate (new) LibraryGuardrailEngine implements all 5 by calling the library's matching gate. Each emit pulls the gate type from Result.Gate directly — single source of truth. Wiring (forge-cli/runtime/runner.go registerGuardrailHooks): - BeforeLLMCall hook → CheckContext over every system-role message in HookContext.Messages. Closest thing Forge has to "retrieved context" today; future memory / RAG work can call CheckContext directly from the recall path for a finer-grained seam. - BeforeToolExec hook → CheckToolCall over hctx.ToolInput. Masks args in-place; blocks abort the tool exec the same way the AfterToolExec gate does. - AfterToolExec hook → CheckToolOutput (existing). - CheckInbound / CheckOutbound continue to be called directly from the A2A handlers (outside the agent loop's hook surface because the loop only sees ChatMessages, not A2A envelopes). - CheckStream is NOT auto-wired: Forge's ExecuteStream is a buffered wrapper around non-streaming Execute. The method is exposed for direct callers of llm.Client.ChatStream and for future loop work that adds a real per-chunk seam. Event-shape change: - The fields.direction field (added in #155, unused by any consumer per #159 conversation) is REPLACED by fields.gate, sourced from Result.Gate. - gate values are exactly the five library constants: input / context / tool_call / output / stream. - fields.tool is set on tool_call AND on output events for tool return text — so consumers can distinguish OutputGate fires on tool results from OutputGate fires on the model's reply to the user without a synthetic direction field. Pre-#159 agents emitted direction-only. Consumers that need to support both vintages map old direction values to gate via the table in docs/security/guardrails.md (inbound→input, outbound→output, tool_output→output+tool). Tests: - NoopGuardrailChecker pass-through for all 5 gates. - Mask emit pins gate=input and asserts direction MUST NOT appear in the JSON. - New TestLibraryGuardrailEngine_EmitsAuditOnToolCallMask drives the ToolCallGate path. - New empty-input short-circuit test for the three new methods. Docs: - docs/security/guardrails.md — gate field reference, the five- gate call-site table, the pre-#159 migration block. - docs/security/audit-logging.md — guardrail_check row updated. - .claude/skills/forge.md — AuditGuardrail entry updated.

Symmetric to the guardrail_check audit emission shipped in #156 / #160 — every Check* method on LibraryGuardrailEngine opens a guardrail.<gate> child span and stamps the same gate / decision / violation metadata operators see on the audit event. Span names map to the library's gate vocabulary: - guardrail.input (CheckInbound → InputGate) - guardrail.context (CheckContext → ContextGate) - guardrail.tool_call (CheckToolCall → ToolCallGate) - guardrail.output (CheckOutbound + CheckToolOutput → OutputGate) - guardrail.stream (CheckStream → StreamGate; not auto-wired) The CheckOutbound case splits per text part — one guardrail.output span per OutputGate call so the trace tree mirrors the part-level iteration cleanly. Attribute keys (new constants in forge-core/observability/attrs.go): - forge.guardrail.gate (Result.Gate) - forge.guardrail.decision (Result.Decision: allow/mask/block/warn) - forge.guardrail.type (first violation's Type) - forge.guardrail.category (first violation's Category) - forge.guardrail.violation_count (len(Result.Violations)) - forge.guardrail.evidence (gated by CaptureContent + Redact) - forge.tool.name (reused from #130; set on tool_call + tool-output paths) Block decisions stamp OTel Error status with the violation summary as the status description — operators see red bars in the trace UI without custom attribute queries. forge.guardrail.evidence follows the #130 + #156 content rule exactly: default off; with CaptureContent on, the mask path emits post-mask content (matches what the LLM actually saw) and the block/warn paths emit original content. PrepareSpanContent runs the same redact-then-truncate pipeline used for gen_ai.input.messages and forge.tool.args, so the four content streams share one consistent shape. Wiring: - LibraryGuardrailEngine grows a tracingCfg field + WithTracing setter; BuildGuardrailChecker gains a TracingConfig parameter and calls WithTracing on every constructed engine. - runner.Start resolves TracingConfig early (it's a pure config resolution — no I/O) so the guardrail engine sees it before NewTracerProvider runs; the later tracing block reuses the resolved value. - When tracing is disabled, the noop tracer short-circuits; spans are not produced at all. CaptureContent only controls the evidence attribute — the span itself is always opened (it's cheap when tracing is off). Tests (forge-cli/runtime/guardrails_tracing_test.go): - guardrail.input span lands with gate/decision/violation_count attributes; evidence ABSENT when CaptureContent=false - evidence PRESENT but raw PII absent when CaptureContent=true (post-mask rule) - guardrail.tool_call carries forge.tool.name - guardrail.output for CheckOutbound has NO tool attribute (distinguishes "model reply to user" from tool-result OutputGate fires) - guardrail.context + guardrail.stream spans land - noop-tracer path: no spans recorded Docs: docs/core-concepts/observability-tracing.md gains a "Guardrail spans" section under "Span content capture" listing span names, nesting, attribute reference, and the content-capture parity note.

initializ-mk mentioned this pull request Jun 14, 2026

Guardrail span parity: emit OTel spans + attributes alongside the audit events #161

Closed

initializ-mk merged commit 0690f47 into main Jun 14, 2026
10 checks passed

initializ-mk mentioned this pull request Jun 15, 2026

feat(observability): OTel spans for every guardrail gate (closes #161) #167

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(guardrails): wire all 5 library gates + drop direction in favor of gate (closes #159)#160

feat(guardrails): wire all 5 library gates + drop direction in favor of gate (closes #159)#160
initializ-mk merged 1 commit into
mainfrom
feat/issue-159-all-gates

initializ-mk commented Jun 14, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

initializ-mk commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Interface (forge-core/runtime/guardrails.go)

Wiring (registerGuardrailHooks in forge-cli/runtime/runner.go)

Why ContextGate at `BeforeLLMCall`

Event-shape change

New event shape

Migration from pre-#159 agents

Files

Test plan

Schema impact

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

initializ-mk commented Jun 14, 2026 •

edited

Loading

Interface (`forge-core/runtime/guardrails.go`)

Wiring (`registerGuardrailHooks` in `forge-cli/runtime/runner.go`)