Context
The `guardrail_check` audit event currently carries `fields.direction`
(`inbound` / `outbound` / `tool_output`) — Forge's 2-direction model
on top of the library's 5-gate model. Three of those library gates
are wired silently to "never fires":
| Library gate |
Status |
Right call site |
| `input` |
✅ wired (`CheckInbound`) |
A2A handler |
| `output` |
✅ wired (`CheckOutbound`, `CheckToolOutput`) |
A2A handler + AfterToolExec hook |
| `tool_call` |
❌ never invoked |
BeforeToolExec hook — args before tool runs |
| `context` |
❌ never invoked |
BeforeLLMCall hook — retrieved knowledge / RAG before prompt assembly |
| `stream` |
❌ never invoked |
Per-chunk inside the LLM stream loop |
`DefaultStructuredGuardrails().GateConfig` advertises `ToolCallGate: true` and
`ContextGate: false` to the library, but the library can't act on
`ToolCallGate` because the agent runtime never calls it. Operators
flipping the bit in `guardrails.json` get silent no-ops.
Scope
Step 1 — emit `gate` explicitly (small, immediate)
`emitGuardrailEvent` already holds the `*guardrails.Result`; the
library populates `Result.Gate` at the call site. One line in
`forge-cli/runtime/guardrails_audit.go`:
```go
fields["gate"] = string(res.Gate)
```
Lands consumers a primary `gate` key alongside the existing
`direction`. The downstream consumer-side unification ("`gate ?? direction`
fallback") then only has to cover events emitted before this line
lands. Add an event-shape doc note in `docs/security/guardrails.md`.
Step 2 — wire ToolCallGate
Add `CheckToolCall(ctx, toolName, args string) error` to the
`coreruntime.GuardrailChecker` interface. Implementation in
`LibraryGuardrailEngine`:
- Call `manager.ToolCallGate(ctx, {Content: args, EntityID, OrgID,
EntityType, StructuredGuardrails, ConfigVersion, Metadata:
{tool_name: toolName}})`.
- Mask / block / warn following the same shape as `CheckOutbound`.
- Emit `guardrail_check` with `gate="tool_call"`, `direction="tool_call"`,
`tool=toolName`.
Wire at the `BeforeToolExec` hook in `registerGuardrailHooks` —
same shape as the existing `AfterToolExec` hook that already calls
`CheckToolOutput`. Blocked → returning an error from the hook
aborts the tool exec the same way enforce-mode does today.
Step 3 — wire ContextGate
Add `CheckContext(ctx, content string) error` to the interface and
implementation. The natural call site is wherever long-term-memory
recall lands retrieved context into the prompt — likely a
`BeforeLLMCall`-adjacent hook fed by the memory subsystem. Needs
a small additional helper on the runtime side because today there's
no single "retrieved-context-being-injected" interception point.
Step 4 (optional, larger) — wire StreamGate
Per-chunk filtering inside the LLM streaming loop in
`forge-core/runtime/loop.go`. Trickier because it has to run
synchronously per token block without breaking the streaming
contract. Defer until there's an operator ask — the per-chunk gate
is most useful for moderation/jailbreak detection on already-
streaming responses, which is a less common need than per-request
input/output gating.
Backwards compatibility
Audit event reference after the work
| direction |
gate |
When |
| `inbound` |
`input` |
user msg → InputGate |
| `tool_call` |
`tool_call` |
agent's about-to-call-tool args → ToolCallGate (new) |
| `tool_output` |
`output` |
tool result text → OutputGate |
| `outbound` |
`output` |
model response → OutputGate |
| `context` |
`context` |
retrieved RAG content → ContextGate (new) |
Why split this from #155
#155 fixed the immediate "events not emitted at all" gap and
delivered the metadata-only / opt-in-evidence posture. Wiring two
more gates is a separable runtime-side expansion that needs a hook
contract change (`BeforeToolExec` for tool-call, a new context-
injection hook for ContextGate). Smaller, focused review.
Related
Context
The `guardrail_check` audit event currently carries `fields.direction`
(`inbound` / `outbound` / `tool_output`) — Forge's 2-direction model
on top of the library's 5-gate model. Three of those library gates
are wired silently to "never fires":
`DefaultStructuredGuardrails().GateConfig` advertises `ToolCallGate: true` and
`ContextGate: false` to the library, but the library can't act on
`ToolCallGate` because the agent runtime never calls it. Operators
flipping the bit in `guardrails.json` get silent no-ops.
Scope
Step 1 — emit `gate` explicitly (small, immediate)
`emitGuardrailEvent` already holds the `*guardrails.Result`; the
library populates `Result.Gate` at the call site. One line in
`forge-cli/runtime/guardrails_audit.go`:
```go
fields["gate"] = string(res.Gate)
```
Lands consumers a primary `gate` key alongside the existing
`direction`. The downstream consumer-side unification ("`gate ?? direction`
fallback") then only has to cover events emitted before this line
lands. Add an event-shape doc note in `docs/security/guardrails.md`.
Step 2 — wire ToolCallGate
Add `CheckToolCall(ctx, toolName, args string) error` to the
`coreruntime.GuardrailChecker` interface. Implementation in
`LibraryGuardrailEngine`:
EntityType, StructuredGuardrails, ConfigVersion, Metadata:
{tool_name: toolName}})`.
`tool=toolName`.
Wire at the `BeforeToolExec` hook in `registerGuardrailHooks` —
same shape as the existing `AfterToolExec` hook that already calls
`CheckToolOutput`. Blocked → returning an error from the hook
aborts the tool exec the same way enforce-mode does today.
Step 3 — wire ContextGate
Add `CheckContext(ctx, content string) error` to the interface and
implementation. The natural call site is wherever long-term-memory
recall lands retrieved context into the prompt — likely a
`BeforeLLMCall`-adjacent hook fed by the memory subsystem. Needs
a small additional helper on the runtime side because today there's
no single "retrieved-context-being-injected" interception point.
Step 4 (optional, larger) — wire StreamGate
Per-chunk filtering inside the LLM streaming loop in
`forge-core/runtime/loop.go`. Trickier because it has to run
synchronously per token block without breaking the streaming
contract. Defer until there's an operator ask — the per-chunk gate
is most useful for moderation/jailbreak detection on already-
streaming responses, which is a less common need than per-request
input/output gating.
Backwards compatibility
shipped against PR Guardrail mask/block events never reach audit pipeline (guardrail_check is dead code) #155 / feat(audit): emit guardrail_check on every mask/block/warn with opt-in evidence (closes #155) #156 continue to work.
Step 2 adds `ToolCallGate` as a new event class (`direction="tool_call"`),
it doesn't replace the existing `tool_output` event.
Audit event reference after the work
Why split this from #155
#155 fixed the immediate "events not emitted at all" gap and
delivered the metadata-only / opt-in-evidence posture. Wiring two
more gates is a separable runtime-side expansion that needs a hook
contract change (`BeforeToolExec` for tool-call, a new context-
injection hook for ContextGate). Smaller, focused review.
Related