feat(audit): stamp entity_id + entity_type on every event (closes #164)#165
Merged
Conversation
Adds top-level `entity_id` and `entity_type` fields to every Forge audit event, sourced from FORGE_AGENT_ID / forge.yaml `agent_id` with `entity_type` hardcoded to "agent". Field names + values are taken straight from the guardrails library's BasePayload vocabulary (EntityID, EntityType — "agent" / "workflow" / "assistant" constants) so consumers reading both the Forge NDJSON stream and the library's MongoDB GuardrailAuditEvent collection can join on `(entity_id, entity_type)` 1:1 without a translation table. Two-layer precedence (no per-request override layer — entity identity is fixed at process startup): 1. Explicit EntityID/EntityType on the event 2. AuditLogger.WithEntity static stamp (from env / forge.yaml) Both fields use omitempty. Deployments not setting agent_id keep emitting the pre-#164 JSON shape verbatim. No schema bump. Wiring (forge-cli/runtime/runner.go) mirrors BuildGuardrailChecker's existing AgentID resolution (guardrails_loader.go:46-50): env wins over forge.yaml. Called right after the existing WithTenancy stamp so all four tenancy/entity fields land together on every event, including startup banners (agent_card_published, policy_loaded, audit_export_status). Tests pin: static stamp on plain Emit, no stamp omits both keys, EmitFromContext per-invocation events carry the stamp alongside correlation_id, explicit event value beats the static stamp, partial WithEntity ("", id) installs only EntityID. Docs: - docs/security/audit-logging.md gains an Entity stamping section with the 1:1 library-join note. - docs/security/tenancy.md splits the precedence table into Tenancy fields + Entity fields subsections; documents the no-header-layer choice. Future-proofs for non-agent entities: when Forge adds workflow or assistant runtimes, the field name doesn't change — only the stamped value. Additive value change, not a schema change.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds top-level `entity_id` and `entity_type` fields to every Forge audit event, sourced from `FORGE_AGENT_ID` (or forge.yaml `agent_id`) with `entity_type` hardcoded to `"agent"`. Field names + values match the guardrails library's vocabulary 1:1 so consumers can join the Forge NDJSON stream against the library's MongoDB `GuardrailAuditEvent` collection on `(entity_id, entity_type)` without translation.
Why entity_id + entity_type, not agent_id
Original #164 proposal said `agent_id`. Renamed to `entity_id` + `entity_type` for two reasons:
1:1 column compatibility — The guardrails library's `BasePayload` carries `EntityID` + `EntityType` (constants `agent` / `workflow` / `assistant`). When `FORGE_GUARDRAILS_DB` is set, the library writes `GuardrailAuditEvent` records into MongoDB with these exact column names. Forge using `agent_id` would force every consumer reading both streams to maintain a translation table forever.
Future-proof for non-agent entities — Forge runs agents today, but the library already supports workflows and assistants. Encoding the entity type as a value (not a field name) means adding a second entity type later is an additive value change, not a schema change.
Precedence
Two layers — no per-request header layer like #157 has, because entity identity is fixed at process startup:
If a deployment needs per-request entity routing, the tenancy layer (`X-Forge-Org-ID` / `X-Forge-Workspace-ID`) from #157 already covers that — agent identity is the process, by definition.
Wiring (forge-cli/runtime/runner.go)
Mirrors `BuildGuardrailChecker`'s existing AgentID resolution at `guardrails_loader.go:46-50` — env wins over forge.yaml. Called right after the existing `WithTenancy` stamp so all four tenancy/entity fields land together on every emit:
```go
agentID := os.Getenv("FORGE_AGENT_ID")
if agentID == "" && r.cfg.Config != nil {
agentID = r.cfg.Config.AgentID
}
auditLogger.WithEntity("agent", agentID)
```
Event shape
Before:
```json
{"ts":"...","event":"session_start","correlation_id":"...","task_id":"...","org_id":"org_x","workspace_id":"ws_y"}
```
After (no behavior change unless `FORGE_AGENT_ID` or forge.yaml `agent_id` is set):
```json
{"ts":"...","event":"session_start","correlation_id":"...","task_id":"...","org_id":"org_x","workspace_id":"ws_y","entity_id":"my-agent","entity_type":"agent"}
```
Files
Test plan
Schema impact
Additive only. Both keys use `omitempty`. Deployments setting neither env nor forge.yaml `agent_id` keep emitting the pre-#164 JSON shape verbatim — no `AuditSchemaVersion` bump.
Stamping stack now complete
After this merges, every audit event carries the full deploy identifier set:
SIEM filter: `org_id=X AND workspace_id=Y AND entity_id=Z` uniquely identifies a Forge deploy across the export stream.
Closes #164