Skip to content

feat(guardrails): wire all 5 library gates + drop direction in favor of gate (closes #159)#160

Merged
initializ-mk merged 1 commit into
mainfrom
feat/issue-159-all-gates
Jun 14, 2026
Merged

feat(guardrails): wire all 5 library gates + drop direction in favor of gate (closes #159)#160
initializ-mk merged 1 commit into
mainfrom
feat/issue-159-all-gates

Conversation

@initializ-mk

@initializ-mk initializ-mk commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Summary

Forge previously invoked only `InputGate` and `OutputGate` of the five gates the guardrails library supports. `ToolCallGate`, `ContextGate`, and `StreamGate` were defined in the library — `GateConfig.ToolCallGate: true` was even advertised in `DefaultStructuredGuardrails` — but the agent runtime never invoked them. Silent no-ops.

This PR wires all five and unifies the audit-event shape on the library's gate vocabulary.

Interface (forge-core/runtime/guardrails.go)

Method Library gate
CheckInbound(ctx, msg) InputGate
CheckOutbound(ctx, msg) OutputGate
CheckToolCall(ctx, toolName, args) ToolCallGate (new)
CheckToolOutput(ctx, toolName, text) OutputGate (with tool metadata)
CheckContext(ctx, content) ContextGate (new)
CheckStream(ctx, chunk) StreamGate (new)

LibraryGuardrailEngine implements all five by calling the matching library method. `NoopGuardrailChecker` pass-through covers all five.

Wiring (registerGuardrailHooks in forge-cli/runtime/runner.go)

Hook New gate call
`BeforeLLMCall` `CheckContext` over every `system`-role message in `HookContext.Messages`
`BeforeToolExec` `CheckToolCall` over `hctx.ToolInput`; masks in-place; blocks abort the tool exec
`AfterToolExec` `CheckToolOutput` (unchanged)

`CheckInbound` / `CheckOutbound` continue to be called directly from the A2A handlers — they sit outside the agent loop's hook surface because the loop only sees `ChatMessages`, not A2A envelopes.

`CheckStream` is not auto-wired — Forge's `ExecuteStream` (`forge-core/runtime/loop.go:837`) is a buffered wrapper around non-streaming `Execute`. The method is exposed for direct callers of `llm.Client.ChatStream` and for future loop work that adds a real per-chunk seam.

Why ContextGate at `BeforeLLMCall`

Forge has no separate "retrieved-context-being-injected" interception point today (memory recall / RAG result merging happens at message-assembly time). Scanning `system`-role messages at `BeforeLLMCall` is the closest defensible seam: dynamic system content (RAG output, templated context) usually lives in those messages, and re-scanning per iteration is cheap when no rule matches. Future memory work can call `CheckContext` directly from the recall path for a finer-grained seam.

Event-shape change

The `fields.direction` key (introduced in #155, not consumed by any reader per the discussion that led to #159) is replaced by `fields.gate`, sourced from `Result.Gate` — the library's own classification.

`gate` Path
`input` InputGate fire on user message
`context` ContextGate fire on system-role message
`tool_call` ToolCallGate fire on tool args
`output` OutputGate fire; `fields.tool` set when the fire was on a tool's return text vs the model's reply to the user
`stream` StreamGate fire (no auto-wire today)

New event shape

```json
{
"event": "guardrail_check",
"fields": {
"gate": "tool_call",
"decision": "masked",
"guardrail": "pii",
"category": "email",
"violation_count": 1,
"tool": "send_email"
}
}
```

Migration from pre-#159 agents

Consumers that need to read both vintages map old direction values to gate:

Old `direction` Derived `gate`
`inbound` `input`
`outbound` `output` (no `tool`)
`tool_output` `output` (with `tool` set)

Documented in `docs/security/guardrails.md#audit-events`.

Files

File Change
`forge-core/runtime/guardrails.go` Interface gains `CheckToolCall` / `CheckContext` / `CheckStream`; doc-comment updated to explain the gate vocabulary
`forge-core/runtime/guardrails_test.go` Noop pass-through asserts all 5
`forge-cli/runtime/guardrails_engine.go` LibraryGuardrailEngine implements the three new methods; existing five emit sites no longer pass a `direction` string
`forge-cli/runtime/guardrails_audit.go` `emitGuardrailEvent` signature drops `direction`; stamps `fields.gate` from `res.Gate`
`forge-cli/runtime/runner.go` `registerGuardrailHooks` now registers BeforeLLMCall + BeforeToolExec alongside AfterToolExec
`forge-cli/runtime/guardrails_engine_test.go` Existing mask-emit test asserts `gate=input` and explicit absence of `direction`; new tests for ToolCallGate emit and empty-input short-circuit on all three new methods
`docs/security/guardrails.md` Updated field-reference + gate-call-site tables + migration block
`docs/security/audit-logging.md` `guardrail_check` row updated
`.claude/skills/forge.md` AuditGuardrail entry updated

Test plan

  • `go test ./...` clean in forge-core and forge-cli
  • `golangci-lint run ./...` → 0 issues
  • `gofmt -w` applied
  • End-to-end smoke: deploy with `FORGE_AUDIT_SOCKET` set, run a task that triggers a PII-bearing tool args (e.g. an email-sending skill with user-supplied recipient), confirm a `guardrail_check` row with `gate=tool_call` and `tool=` appears on the socket before the tool runs.
  • Confirm a tool with PII in its output still emits `gate=output` + `tool=` (unchanged shape from existing test coverage).
  • Confirm an inbound user message with PII emits `gate=input` (no `direction` key).

Schema impact

`fields.gate` is additive. `fields.direction` is dropped. Per the discussion that led to #159, no consumer reads `direction` today — the pre-existing PR that introduced it (#155 / #156) shipped the field without any downstream wiring. The unification PR on the consumer side will implement the `gate ?? deriveFromDirection(direction)` fallback for historical events emitted from pre-#159 agents.

Closes #159

…ate (closes #159)

Forge previously invoked only InputGate and OutputGate of the
five gates the guardrails library supports. The other three
(ToolCallGate, ContextGate, StreamGate) were defined in the
library and even advertised via GateConfig.ToolCallGate=true in
the default StructuredGuardrails, but the agent runtime never
called them — silent no-ops.

This commit wires all five and unifies the audit-event shape on
the library's own gate vocabulary.

Interface (forge-core/runtime/guardrails.go):
- CheckInbound(ctx, msg) error                  — InputGate
- CheckOutbound(ctx, msg) error                 — OutputGate
- CheckToolCall(ctx, toolName, args) (str, err) — ToolCallGate (new)
- CheckToolOutput(ctx, toolName, text) (str, err) — OutputGate
- CheckContext(ctx, content) (str, err)         — ContextGate (new)
- CheckStream(ctx, chunk) (str, err)            — StreamGate (new)

LibraryGuardrailEngine implements all 5 by calling the library's
matching gate. Each emit pulls the gate type from Result.Gate
directly — single source of truth.

Wiring (forge-cli/runtime/runner.go registerGuardrailHooks):
- BeforeLLMCall hook → CheckContext over every system-role
  message in HookContext.Messages. Closest thing Forge has to
  "retrieved context" today; future memory / RAG work can call
  CheckContext directly from the recall path for a finer-grained
  seam.
- BeforeToolExec hook → CheckToolCall over hctx.ToolInput. Masks
  args in-place; blocks abort the tool exec the same way the
  AfterToolExec gate does.
- AfterToolExec hook → CheckToolOutput (existing).
- CheckInbound / CheckOutbound continue to be called directly
  from the A2A handlers (outside the agent loop's hook surface
  because the loop only sees ChatMessages, not A2A envelopes).
- CheckStream is NOT auto-wired: Forge's ExecuteStream is a
  buffered wrapper around non-streaming Execute. The method is
  exposed for direct callers of llm.Client.ChatStream and for
  future loop work that adds a real per-chunk seam.

Event-shape change:
- The fields.direction field (added in #155, unused by any
  consumer per #159 conversation) is REPLACED by fields.gate,
  sourced from Result.Gate.
- gate values are exactly the five library constants:
  input / context / tool_call / output / stream.
- fields.tool is set on tool_call AND on output events for tool
  return text — so consumers can distinguish OutputGate fires on
  tool results from OutputGate fires on the model's reply to the
  user without a synthetic direction field.

Pre-#159 agents emitted direction-only. Consumers that need to
support both vintages map old direction values to gate via the
table in docs/security/guardrails.md (inbound→input,
outbound→output, tool_output→output+tool).

Tests:
- NoopGuardrailChecker pass-through for all 5 gates.
- Mask emit pins gate=input and asserts direction MUST NOT
  appear in the JSON.
- New TestLibraryGuardrailEngine_EmitsAuditOnToolCallMask drives
  the ToolCallGate path.
- New empty-input short-circuit test for the three new methods.

Docs:
- docs/security/guardrails.md — gate field reference, the five-
  gate call-site table, the pre-#159 migration block.
- docs/security/audit-logging.md — guardrail_check row updated.
- .claude/skills/forge.md — AuditGuardrail entry updated.
@initializ-mk initializ-mk merged commit 0690f47 into main Jun 14, 2026
10 checks passed
initializ-mk added a commit that referenced this pull request Jun 15, 2026
Symmetric to the guardrail_check audit emission shipped in
#156 / #160 — every Check* method on LibraryGuardrailEngine opens
a guardrail.<gate> child span and stamps the same gate / decision
/ violation metadata operators see on the audit event.

Span names map to the library's gate vocabulary:
  - guardrail.input        (CheckInbound → InputGate)
  - guardrail.context      (CheckContext → ContextGate)
  - guardrail.tool_call    (CheckToolCall → ToolCallGate)
  - guardrail.output       (CheckOutbound + CheckToolOutput → OutputGate)
  - guardrail.stream       (CheckStream → StreamGate; not auto-wired)

The CheckOutbound case splits per text part — one
guardrail.output span per OutputGate call so the trace tree
mirrors the part-level iteration cleanly.

Attribute keys (new constants in forge-core/observability/attrs.go):
  - forge.guardrail.gate              (Result.Gate)
  - forge.guardrail.decision          (Result.Decision: allow/mask/block/warn)
  - forge.guardrail.type              (first violation's Type)
  - forge.guardrail.category          (first violation's Category)
  - forge.guardrail.violation_count   (len(Result.Violations))
  - forge.guardrail.evidence          (gated by CaptureContent + Redact)
  - forge.tool.name                   (reused from #130; set on tool_call + tool-output paths)

Block decisions stamp OTel Error status with the violation summary
as the status description — operators see red bars in the trace UI
without custom attribute queries.

forge.guardrail.evidence follows the #130 + #156 content rule
exactly: default off; with CaptureContent on, the mask path
emits post-mask content (matches what the LLM actually saw) and
the block/warn paths emit original content. PrepareSpanContent
runs the same redact-then-truncate pipeline used for
gen_ai.input.messages and forge.tool.args, so the four content
streams share one consistent shape.

Wiring:
  - LibraryGuardrailEngine grows a tracingCfg field + WithTracing
    setter; BuildGuardrailChecker gains a TracingConfig parameter
    and calls WithTracing on every constructed engine.
  - runner.Start resolves TracingConfig early (it's a pure config
    resolution — no I/O) so the guardrail engine sees it before
    NewTracerProvider runs; the later tracing block reuses the
    resolved value.
  - When tracing is disabled, the noop tracer short-circuits;
    spans are not produced at all. CaptureContent only controls
    the evidence attribute — the span itself is always opened
    (it's cheap when tracing is off).

Tests (forge-cli/runtime/guardrails_tracing_test.go):
  - guardrail.input span lands with gate/decision/violation_count
    attributes; evidence ABSENT when CaptureContent=false
  - evidence PRESENT but raw PII absent when CaptureContent=true
    (post-mask rule)
  - guardrail.tool_call carries forge.tool.name
  - guardrail.output for CheckOutbound has NO tool attribute
    (distinguishes "model reply to user" from tool-result OutputGate
    fires)
  - guardrail.context + guardrail.stream spans land
  - noop-tracer path: no spans recorded

Docs: docs/core-concepts/observability-tracing.md gains a
"Guardrail spans" section under "Span content capture" listing
span names, nesting, attribute reference, and the content-capture
parity note.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Guardrail gates: emit gate explicitly + wire ToolCallGate / ContextGate (and StreamGate)

1 participant