Skip to content

Guardrail gates: emit gate explicitly + wire ToolCallGate / ContextGate (and StreamGate) #159

@initializ-mk

Description

@initializ-mk

Context

The `guardrail_check` audit event currently carries `fields.direction`
(`inbound` / `outbound` / `tool_output`) — Forge's 2-direction model
on top of the library's 5-gate model. Three of those library gates
are wired silently to "never fires":

Library gate Status Right call site
`input` ✅ wired (`CheckInbound`) A2A handler
`output` ✅ wired (`CheckOutbound`, `CheckToolOutput`) A2A handler + AfterToolExec hook
`tool_call` ❌ never invoked BeforeToolExec hook — args before tool runs
`context` ❌ never invoked BeforeLLMCall hook — retrieved knowledge / RAG before prompt assembly
`stream` ❌ never invoked Per-chunk inside the LLM stream loop

`DefaultStructuredGuardrails().GateConfig` advertises `ToolCallGate: true` and
`ContextGate: false` to the library, but the library can't act on
`ToolCallGate` because the agent runtime never calls it. Operators
flipping the bit in `guardrails.json` get silent no-ops.

Scope

Step 1 — emit `gate` explicitly (small, immediate)

`emitGuardrailEvent` already holds the `*guardrails.Result`; the
library populates `Result.Gate` at the call site. One line in
`forge-cli/runtime/guardrails_audit.go`:

```go
fields["gate"] = string(res.Gate)
```

Lands consumers a primary `gate` key alongside the existing
`direction`. The downstream consumer-side unification ("`gate ?? direction`
fallback") then only has to cover events emitted before this line
lands. Add an event-shape doc note in `docs/security/guardrails.md`.

Step 2 — wire ToolCallGate

Add `CheckToolCall(ctx, toolName, args string) error` to the
`coreruntime.GuardrailChecker` interface. Implementation in
`LibraryGuardrailEngine`:

  • Call `manager.ToolCallGate(ctx, {Content: args, EntityID, OrgID,
    EntityType, StructuredGuardrails, ConfigVersion, Metadata:
    {tool_name: toolName}})`.
  • Mask / block / warn following the same shape as `CheckOutbound`.
  • Emit `guardrail_check` with `gate="tool_call"`, `direction="tool_call"`,
    `tool=toolName`.

Wire at the `BeforeToolExec` hook in `registerGuardrailHooks` —
same shape as the existing `AfterToolExec` hook that already calls
`CheckToolOutput`. Blocked → returning an error from the hook
aborts the tool exec the same way enforce-mode does today.

Step 3 — wire ContextGate

Add `CheckContext(ctx, content string) error` to the interface and
implementation. The natural call site is wherever long-term-memory
recall lands retrieved context into the prompt — likely a
`BeforeLLMCall`-adjacent hook fed by the memory subsystem. Needs
a small additional helper on the runtime side because today there's
no single "retrieved-context-being-injected" interception point.

Step 4 (optional, larger) — wire StreamGate

Per-chunk filtering inside the LLM streaming loop in
`forge-core/runtime/loop.go`. Trickier because it has to run
synchronously per token block without breaking the streaming
contract. Defer until there's an operator ask — the per-chunk gate
is most useful for moderation/jailbreak detection on already-
streaming responses, which is a less common need than per-request
input/output gating.

Backwards compatibility

Audit event reference after the work

direction gate When
`inbound` `input` user msg → InputGate
`tool_call` `tool_call` agent's about-to-call-tool args → ToolCallGate (new)
`tool_output` `output` tool result text → OutputGate
`outbound` `output` model response → OutputGate
`context` `context` retrieved RAG content → ContextGate (new)

Why split this from #155

#155 fixed the immediate "events not emitted at all" gap and
delivered the metadata-only / opt-in-evidence posture. Wiring two
more gates is a separable runtime-side expansion that needs a hook
contract change (`BeforeToolExec` for tool-call, a new context-
injection hook for ContextGate). Smaller, focused review.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions