feat(sdk): make the chat.agent system prompt cacheable#3952
Conversation
chat.toStreamTextOptions() can now emit the system prompt as a structured message carrying providerOptions, so a provider can cache the system block. Opt in three ways (most specific wins): cacheControl sugar or systemProviderOptions on toStreamTextOptions(), or providerOptions on chat.prompt.set(). Without an option, system stays a plain string.
🦋 Changeset detectedLatest commit: 981d9bd The changes in this PR will be included in the next version bump. This PR includes changesets to release 25 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (33)
WalkthroughThis patch introduces opt-in Anthropic prompt caching for the chat agent's system prompt in 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint install timed out. The project may have too many dependencies for the sandbox. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…roviderOptions ai re-exports ProviderMetadata (same Record<string, Record<string, JSONValue>> shape) but not ProviderOptions, so importing the latter failed the SDK build with TS2459. The public option names are unchanged.
@trigger.dev/build
trigger.dev
@trigger.dev/core
@trigger.dev/python
@trigger.dev/react-hooks
@trigger.dev/redis-worker
@trigger.dev/rsc
@trigger.dev/schema-to-json
@trigger.dev/sdk
commit: |
A later chat.prompt.set() with no options left the previous prompt's providerOptions in locals, so toStreamTextOptions() could still cache a prompt that did not opt in. Always overwrite the slot. Also replaces fixed sleeps in the prompt-caching test with a bounded condition wait.
…locks (#3954) ## Summary The script that generates the changeset release PR description was silently dropping some changelog entries and stripping code examples. In [#3932](#3932), entry [#3937](#3937) was missing entirely from the Improvements list and [#3952](#3952 code block was gone, even though both were present in the raw changeset output. ## Root cause `parsePrBody` parsed the raw changeset body line by line: - The dependency-bump filter matched any entry whose text *began* with a backticked package name, so a real changelog entry like `` `@trigger.dev/sdk` now bundles... `` got thrown out along with the genuine version-bump lines. - Only the first line of each bullet was kept, so fenced code blocks, sub-bullets, and continuation paragraphs were discarded. ## Fix Group each top-level bullet with its indented continuation (code blocks, sub-bullets, paragraphs), dedent it, and re-emit it intact. The dependency filter is now anchored so it only matches lines that are *entirely* a package bump, leaving real entries that merely start with a package name. Verified by replaying #3932's raw body through the script: #3937 returns to the list, #3952's code block is preserved, and #3936's sub-bullets nest correctly under their parent.
## Summary 7 improvements. ## Improvements - `@trigger.dev/sdk` now bundles the Trigger.dev agent skills and a curated snapshot of the docs those skills reference. The skills that `trigger skills` installs into your coding agent read this content from node_modules, so the guidance your AI assistant follows is pinned to the SDK version installed in your project and stays current across upgrades instead of going stale until the next reinstall. ([#3937](#3937)) - Running a CLI command like `dev`, `deploy`, `preview`, or `update` before initializing a project no longer crashes with a raw `Cannot find matching package.json` stack trace. The CLI now detects the missing project and points you to `npx trigger.dev@latest init` instead. ([#3929](#3929)) - The agent skills installed by `trigger skills` are now namespaced with a `trigger-` prefix (e.g. `trigger-authoring-tasks`, `trigger-getting-started`) so they don't collide with unrelated skills in your coding agent's skills directory. Adds a `trigger-cost-savings` skill for auditing and reducing compute spend (right-sizing machines, `maxDuration`, batching, debounce), and `@trigger.dev/sdk` now bundles the full Trigger.dev documentation so your agent can read the complete, version-pinned reference directly from node_modules. ([#3970](#3970)) - The run span API response now includes `cachedCost` and `cacheCreationCost` on the `ai` object, alongside the existing `inputCost` / `outputCost` / `totalCost`. `inputCost` reflects only the non-cached input, so these fields let you reconstruct the full cost breakdown for prompt-cached calls. ([#3958](#3958)) - `chat.headStart` now works with the `chat.customAgent` and `chat.createSession` backends, not only `chat.agent`. The warm step-1 response hands over to your loop the same way it does for a managed agent. ([#3963](#3963)) In a `chat.customAgent` loop, consume the handover on turn 0: ```ts const conversation = new chat.MessageAccumulator(); const { isFinal, skipped } = await conversation.consumeHandover({ payload }); if (skipped) return; // warm handler aborted, so exit without a turn if (isFinal) { await chat.writeTurnComplete(); // step 1 is the response, no streamText } else { const result = streamText({ model, messages: conversation.modelMessages, tools }); // Pass originalMessages so the handed-over tool round merges into the // step-1 assistant instead of starting a new message. const response = await chat.pipeAndCapture(result, { originalMessages: conversation.uiMessages, }); if (response) await conversation.addResponse(response); } ``` With `chat.createSession`, the iterator surfaces it as `turn.handover`; call `turn.complete()` with no argument on a final handover. The lower-level `chat.waitForHandover()` and `accumulator.applyHandover()` are also exported for hand-rolled loops. - Cache your chat agent's system prompt with Anthropic prompt caching. `chat.toStreamTextOptions()` now emits the system prompt as a cacheable message when you opt in, so a large, stable system block is billed at cache-read rates on every turn instead of full price. ([#3952](#3952)) ```ts // at the streamText call site (Anthropic sugar) streamText({ ...chat.toStreamTextOptions({ cacheControl: { type: "ephemeral" } }), messages, }); // provider-agnostic equivalent chat.toStreamTextOptions({ systemProviderOptions: { anthropic: { cacheControl: { type: "ephemeral" } } }, }); // or where the prompt is defined chat.prompt.set(SYSTEM_PROMPT, { providerOptions: { anthropic: { cacheControl: { type: "ephemeral" } } }, }); ``` Without an option, `system` stays a plain string. Pairs with a `prepareMessages` cache breakpoint to cache the conversation prefix across turns too. - Three fixes for custom agent loops (`chat.customAgent`, `chat.createSession`, and hand-rolled `MessageAccumulator` loops): ([#3936](#3936)) - Continuation runs no longer replay already-answered user messages into the first turn. The `.in` resume cursor is now seeded before any listener attaches (the same boot logic `chat.agent` uses), so a chat that continues after a cancel, crash, or upgrade only sees genuinely new messages. - Steering a hand-rolled loop mid-stream no longer wipes the in-flight assistant response. `chat.pipeAndCapture` now stamps a server-generated message id on the stream, so a `prepareStep` injection keeps the partial text instead of replacing the message. - Task-backed tools (`ai.toolExecute`) now work from custom agent loops: the parent's session is threaded to the child run, so child tasks can stream progress into the chat with `chat.stream.writer({ target: "root" })` instead of failing with "session handle is not initialized". <details> <summary>Raw changeset output</summary>⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ `main` is currently in **pre mode** so this branch has prereleases rather than normal releases. If you want to exit prereleases, run `changeset pre exit` on `main`.⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ # Releases ## @trigger.dev/build@4.5.0-rc.7 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.7` ## trigger.dev@4.5.0-rc.7 ### Patch Changes - `@trigger.dev/sdk` now bundles the Trigger.dev agent skills and a curated snapshot of the docs those skills reference. The skills that `trigger skills` installs into your coding agent read this content from node_modules, so the guidance your AI assistant follows is pinned to the SDK version installed in your project and stays current across upgrades instead of going stale until the next reinstall. ([#3937](#3937)) - Running a CLI command like `dev`, `deploy`, `preview`, or `update` before initializing a project no longer crashes with a raw `Cannot find matching package.json` stack trace. The CLI now detects the missing project and points you to `npx trigger.dev@latest init` instead. ([#3929](#3929)) - The agent skills installed by `trigger skills` are now namespaced with a `trigger-` prefix (e.g. `trigger-authoring-tasks`, `trigger-getting-started`) so they don't collide with unrelated skills in your coding agent's skills directory. Adds a `trigger-cost-savings` skill for auditing and reducing compute spend (right-sizing machines, `maxDuration`, batching, debounce), and `@trigger.dev/sdk` now bundles the full Trigger.dev documentation so your agent can read the complete, version-pinned reference directly from node_modules. ([#3970](#3970)) - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.7` - `@trigger.dev/build@4.5.0-rc.7` - `@trigger.dev/schema-to-json@4.5.0-rc.7` ## @trigger.dev/core@4.5.0-rc.7 ### Patch Changes - The run span API response now includes `cachedCost` and `cacheCreationCost` on the `ai` object, alongside the existing `inputCost` / `outputCost` / `totalCost`. `inputCost` reflects only the non-cached input, so these fields let you reconstruct the full cost breakdown for prompt-cached calls. ([#3958](#3958)) ## @trigger.dev/python@4.5.0-rc.7 ### Patch Changes - Updated dependencies: - `@trigger.dev/sdk@4.5.0-rc.7` - `@trigger.dev/core@4.5.0-rc.7` - `@trigger.dev/build@4.5.0-rc.7` ## @trigger.dev/react-hooks@4.5.0-rc.7 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.7` ## @trigger.dev/redis-worker@4.5.0-rc.7 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.7` ## @trigger.dev/rsc@4.5.0-rc.7 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.7` ## @trigger.dev/schema-to-json@4.5.0-rc.7 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.7` ## @trigger.dev/sdk@4.5.0-rc.7 ### Patch Changes - `@trigger.dev/sdk` now bundles the Trigger.dev agent skills and a curated snapshot of the docs those skills reference. The skills that `trigger skills` installs into your coding agent read this content from node_modules, so the guidance your AI assistant follows is pinned to the SDK version installed in your project and stays current across upgrades instead of going stale until the next reinstall. ([#3937](#3937)) - `chat.headStart` now works with the `chat.customAgent` and `chat.createSession` backends, not only `chat.agent`. The warm step-1 response hands over to your loop the same way it does for a managed agent. ([#3963](#3963)) In a `chat.customAgent` loop, consume the handover on turn 0: ```ts const conversation = new chat.MessageAccumulator(); const { isFinal, skipped } = await conversation.consumeHandover({ payload }); if (skipped) return; // warm handler aborted, so exit without a turn if (isFinal) { await chat.writeTurnComplete(); // step 1 is the response, no streamText } else { const result = streamText({ model, messages: conversation.modelMessages, tools }); // Pass originalMessages so the handed-over tool round merges into the // step-1 assistant instead of starting a new message. const response = await chat.pipeAndCapture(result, { originalMessages: conversation.uiMessages, }); if (response) await conversation.addResponse(response); } ``` With `chat.createSession`, the iterator surfaces it as `turn.handover`; call `turn.complete()` with no argument on a final handover. The lower-level `chat.waitForHandover()` and `accumulator.applyHandover()` are also exported for hand-rolled loops. - Add `triggerConfig` support to `chat.headStart()` and `chat.openSession()`, so the auto-triggered handover-prepare run inherits tags, queue, machine, and other session trigger options the same way `chat.createStartSessionAction()` does. The `chat:{chatId}` tag is prepended automatically. ([#3963](#3963)) ```ts export const POST = chat.headStart({ agentId: "my-agent", triggerConfig: { tags: ["org:acme"], queue: "chat" }, run: async ({ chat }) => streamText({ ...chat.toStreamTextOptions(), model }), }); ``` Because the session is created once on the first head-start turn and is idempotent on the chat id, this is the only place to set those options for a head-start chat's lifetime. `chat.createStartSessionAction()` now also forwards `maxDuration`, `region`, and `lockToVersion` so both session entry points stay consistent. - Cache your chat agent's system prompt with Anthropic prompt caching. `chat.toStreamTextOptions()` now emits the system prompt as a cacheable message when you opt in, so a large, stable system block is billed at cache-read rates on every turn instead of full price. ([#3952](#3952)) ```ts // at the streamText call site (Anthropic sugar) streamText({ ...chat.toStreamTextOptions({ cacheControl: { type: "ephemeral" } }), messages, }); // provider-agnostic equivalent chat.toStreamTextOptions({ systemProviderOptions: { anthropic: { cacheControl: { type: "ephemeral" } } }, }); // or where the prompt is defined chat.prompt.set(SYSTEM_PROMPT, { providerOptions: { anthropic: { cacheControl: { type: "ephemeral" } } }, }); ``` Without an option, `system` stays a plain string. Pairs with a `prepareMessages` cache breakpoint to cache the conversation prefix across turns too. - Three fixes for custom agent loops (`chat.customAgent`, `chat.createSession`, and hand-rolled `MessageAccumulator` loops): ([#3936](#3936)) - Continuation runs no longer replay already-answered user messages into the first turn. The `.in` resume cursor is now seeded before any listener attaches (the same boot logic `chat.agent` uses), so a chat that continues after a cancel, crash, or upgrade only sees genuinely new messages. - Steering a hand-rolled loop mid-stream no longer wipes the in-flight assistant response. `chat.pipeAndCapture` now stamps a server-generated message id on the stream, so a `prepareStep` injection keeps the partial text instead of replacing the message. - Task-backed tools (`ai.toolExecute`) now work from custom agent loops: the parent's session is threaded to the child run, so child tasks can stream progress into the chat with `chat.stream.writer({ target: "root" })` instead of failing with "session handle is not initialized". - The agent skills installed by `trigger skills` are now namespaced with a `trigger-` prefix (e.g. `trigger-authoring-tasks`, `trigger-getting-started`) so they don't collide with unrelated skills in your coding agent's skills directory. Adds a `trigger-cost-savings` skill for auditing and reducing compute spend (right-sizing machines, `maxDuration`, batching, debounce), and `@trigger.dev/sdk` now bundles the full Trigger.dev documentation so your agent can read the complete, version-pinned reference directly from node_modules. ([#3970](#3970)) - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.7` ## @trigger.dev/plugins@4.5.0-rc.7 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.7` </details> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Summary
chat.agent's system prompt (thechat.prompttext plus any skills preamble) could not carry a provider cache breakpoint, so the largest and most stable part of the prompt re-paid full input price on every turn.chat.toStreamTextOptions()now emits the system prompt as a structured message carryingproviderOptionswhen you opt in, so a provider can cache the system block. Without an option,systemstays a plain string, so existing behavior is unchanged.API
Three ways to opt in (most specific wins, no deep merge):
The
cacheControlshorthand is Anthropic-only;systemProviderOptionsis the general form. Pairs with aprepareMessagescache breakpoint to cache the conversation prefix too.Docs guide: #3951