fix(agent): propagate messages session ids to runner traces#4767
fix(agent): propagate messages session ids to runner traces#4767mmabrouk wants to merge 1 commit into
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Reviewer guide: interesting code
|
|
|
||
| /** Prefer the platform conversation id, falling back to the harness's ephemeral id. */ | ||
| export function resolveRunSessionId(request: AgentRunRequest, fallback: string): string { | ||
| return request.sessionId && request.sessionId.trim() ? request.sessionId : fallback; |
There was a problem hiding this comment.
The empty-string guard matters: request.sessionId.trim() means a blank id from the envelope is treated as absent, so the harness fallback wins instead of stamping an empty session onto every span.
| ) | ||
| consumed.add(name) | ||
|
|
||
| elif name == "session_id": |
There was a problem hiding this comment.
This sits before the VAR_KEYWORD branch, so a handler that declares an explicit session_id argument gets the envelope value, while a **kwargs handler does not receive it. That asymmetry is intentional and pinned by the two new normalizer tests.
|
|
||
| // Hand the session id + model to the extension so spans carry them. | ||
| otel.config.sessionId = session.sessionId; | ||
| const sessionId = resolveRunSessionId(request, session.sessionId); |
There was a problem hiding this comment.
The previous code stamped session.sessionId onto OTel config and returned it later. Moving the resolved sessionId up here is what actually fixes the trace: the span now carries the platform id, not Pi's ephemeral one.
| @@ -922,7 +924,7 @@ export async function runRivet( | |||
| // `streamingDeltas` advertises end-to-end live deltas, which is only true when a live | |||
There was a problem hiding this comment.
Confirm both replaced session.id sites (here in the result and at run.start) use the same resolved sessionId, so the trace id and the returned metadata stay consistent for a continued conversation.
|
Superseded. Replacing the path-based stack with PRs sliced by functional area showing final code only, so reviewers don't comment on intermediate scaffolding that a later PR rewrites. See the new set. |
This PR is part of a stack. Review bottom-up.
Each PR's diff is only its own delta. Merge from the bottom. This PR's base is #4766 (merge that first).
Context
The
/messagesendpoint already carries asession_idon the request envelope, and the response already echoes it back. But the id stopped at the route. It never reached the agent runner, so Pi and Rivet traces and run metadata were stamped with the harness's own ephemeral session id instead of the platform conversation id. That broke correlation: a multi-turn conversation produced a fresh, unrelated id on every span.This PR threads the platform
session_idfrom the request envelope down to the runner so traces carry it. Base branch:refactor/vercel-adapter-cleanup. This is the session-id seam from slice #2 (protocol) indocs/design/agent-workflows/pr-stack.md, reaching toward slice #5.What this changes
Before, the runner used whatever id the harness minted for itself. Each turn looked like a new conversation in the traces.
After, the runner prefers the platform id when the request supplies one, and falls back to the harness id only when it does not.
The id now travels the full path:
services/oss/src/agent/app.py: the_agenthandler acceptssession_idand passes it intoSessionConfig.normalizer.py: the middleware recognizessession_idas a top-level request field (sitting next toinputsandparameters) and binds it to a handler argument of the same name.protocol.ts:resolveRunSessionId(request, fallback)returnsrequest.sessionIdwhen it is non-empty, else the harness fallback.pi.tsandrivet.ts: both engines callresolveRunSessionIdand stamp the resolved id onto OTel config,run.start, and the returned run metadata.Key architectural decision to review
session_idis modeled as a request-envelope primitive, like a trace id or a span id, not as an HTTP header or an inputs field. Seenormalizer.py:100-102. The middleware already special-casesrequestand theDATA_FIELDS(inputs/outputs/parameters); this addssession_idas a third class of binding: a recognized top-level envelope field, extracted fromrequest.session_idrather than fromrequest.data.inputs.The tradeoff to scrutinize: this hard-codes one field name in the middleware. A future second envelope field would need the same treatment, so if more arrive, this wants a small allowlist set rather than a chain of
elif name == ...branches. The testtest_session_id_is_not_added_to_var_kwargspins the deliberate boundary:session_idbinds only to an explicit handler argument, never to**kwargs, so it does not leak into handlers that did not ask for it.The id is for correlation, not for state. The cold runtime still receives the full conversation in
messages, so the runner does not load history from the id. The doc-comment edit onAgentRunRequest.sessionIdinprotocol.ts:192makes that contract explicit: read it and confirm no engine is treating the id as a key to replay history from.How to review this PR
protocol.ts: readresolveRunSessionId(the whole behavior is here) and thesessionIddoc comment.normalizer.py:100-102: confirm the new branch sits before the**kwargsbranch, so an explicitsession_idargument wins over kwargs capture.pi.tsandrivet.ts: confirm every place that previously usedsession.sessionId/session.idfor tracing or returned metadata now uses the resolved id. Inpi.ts, note the localsessionIdmoved up so OTel config gets the resolved value.app.py: confirmsession_idreachesSessionConfig.Likely regression: a request with no
session_id(plain/invoke, or the first turn of a server-minted session) must still get the harness id.resolveRunSessionId({}, fallback)returning the fallback covers this; the test on that line guards it.Tests / notes
test_normalizer_passthrough.py: explicit-arg binding plus the**kwargsexclusion.test_invoke_handler.py:session_id="sess_request"reaches the backendSessionConfig(asserted via the fake backend'screated_session_ids).continuation.test.ts:resolveRunSessionIdprefers the platform id and falls back when absent.