feat(webapp): split Models into Your models and Model library tabs#3958
Conversation
🦋 Changeset detectedLatest commit: e7e6e84 The changes in this PR will be included in the next version bump. This PR includes changesets to release 25 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThe PR adds a "Your models" tab to the Models page with project-scoped usage metrics and prompt-cache insights. SVG provider icons are updated to use React camelCase attributes. A new 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
The Your models sparklines use dynamic bucket sizes (6h at 7d, etc.), but the tooltip assumed hourly buckets and showed wrong dates. Thread the bucket interval and start through so each bar is labelled correctly. Also pin the library tab cross-tenant p50 TTFC column to a fixed 7-day window so it no longer follows the Your models time selector.
Your models gets a cache-savings column and per-model cached-tokens and cache-hit-rate views; the AI metrics dashboard gets a caching section (hit rate, cached tokens, estimated savings, hit rate by model). Also makes the Your models charts all time-series for consistency.
The cache hit-rate and savings queries divided by zero for models with no cached tokens, surfacing NaN or empty widgets; they now return 0 via ifNull/nullIf. Model usage sparklines bucketed on a timezone-dependent DateTime string, which could misalign bars with the charts above them; they now key on toUnixTimestamp so buckets line up regardless of the ClickHouse server timezone.
input_tokens is the total prompt count, inclusive of cache-read and cache-creation tokens. The cost pipeline charged the full input count at the input price and then added a separate cache line, so cached tokens were billed twice (e.g. ~2.4x on OpenAI), and the cache hit-rate metric divided cached reads by input + cached, understating the rate. Charge the input price only on the fresh (non-cached) remainder, resolve cache prices across provider alias keys (falling back to input price so cache tokens are never free), and compute the hit rate as cached / input.
The prompt detail Metrics tab now shows a Cached tokens total, a cache hit-rate-over-time chart, and a cached-tokens-over-time chart, matching the model detail page. Avg input cost and input cost per 1k now include the cache-read and cache-creation cost lines so they reflect total input spend rather than the fresh-input cost alone.
…add 1h cache alias Scope getModelUsageSparklines by project_id alongside environment_id so it matches the other project-scoped queries and lets ClickHouse use the organization/project/environment key prefix. Add input_cache_creation_1h to the cache-creation price aliases so a model that defines only the 1h key is not dropped to the input price (no current model is affected; the base/5m alias still resolves first).
The span API ai object now returns cachedCost and cacheCreationCost alongside inputCost/outputCost/totalCost. Since inputCost covers only the non-cached input, these fields let consumers reconstruct the full cost breakdown for prompt-cached calls instead of seeing an unexplained gap below totalCost.
@trigger.dev/build
trigger.dev
@trigger.dev/core
@trigger.dev/python
@trigger.dev/react-hooks
@trigger.dev/redis-worker
@trigger.dev/rsc
@trigger.dev/schema-to-json
@trigger.dev/sdk
commit: |
When TimeFilter is used in controlled mode (onValueChange provided), it now takes period/from/to only from props instead of falling back to the URL search params. Selecting a custom date range in the model detail panel (which sets period to undefined) no longer reverts the filter display to the page-level URL period.
|
If you are looking for a no-KYC Claude API alternative: open-source proxy with Claude Opus 4.8, Sonnet 4.6, Qwen-Plus (/M), and DeepSeek (/M). One OpenAI-compatible endpoint. Free test key: dp-c74c5c0930283a79b1c53de5f4443126 No passport. No face scan. Just an API key. |
## Summary 7 improvements. ## Improvements - `@trigger.dev/sdk` now bundles the Trigger.dev agent skills and a curated snapshot of the docs those skills reference. The skills that `trigger skills` installs into your coding agent read this content from node_modules, so the guidance your AI assistant follows is pinned to the SDK version installed in your project and stays current across upgrades instead of going stale until the next reinstall. ([#3937](#3937)) - Running a CLI command like `dev`, `deploy`, `preview`, or `update` before initializing a project no longer crashes with a raw `Cannot find matching package.json` stack trace. The CLI now detects the missing project and points you to `npx trigger.dev@latest init` instead. ([#3929](#3929)) - The agent skills installed by `trigger skills` are now namespaced with a `trigger-` prefix (e.g. `trigger-authoring-tasks`, `trigger-getting-started`) so they don't collide with unrelated skills in your coding agent's skills directory. Adds a `trigger-cost-savings` skill for auditing and reducing compute spend (right-sizing machines, `maxDuration`, batching, debounce), and `@trigger.dev/sdk` now bundles the full Trigger.dev documentation so your agent can read the complete, version-pinned reference directly from node_modules. ([#3970](#3970)) - The run span API response now includes `cachedCost` and `cacheCreationCost` on the `ai` object, alongside the existing `inputCost` / `outputCost` / `totalCost`. `inputCost` reflects only the non-cached input, so these fields let you reconstruct the full cost breakdown for prompt-cached calls. ([#3958](#3958)) - `chat.headStart` now works with the `chat.customAgent` and `chat.createSession` backends, not only `chat.agent`. The warm step-1 response hands over to your loop the same way it does for a managed agent. ([#3963](#3963)) In a `chat.customAgent` loop, consume the handover on turn 0: ```ts const conversation = new chat.MessageAccumulator(); const { isFinal, skipped } = await conversation.consumeHandover({ payload }); if (skipped) return; // warm handler aborted, so exit without a turn if (isFinal) { await chat.writeTurnComplete(); // step 1 is the response, no streamText } else { const result = streamText({ model, messages: conversation.modelMessages, tools }); // Pass originalMessages so the handed-over tool round merges into the // step-1 assistant instead of starting a new message. const response = await chat.pipeAndCapture(result, { originalMessages: conversation.uiMessages, }); if (response) await conversation.addResponse(response); } ``` With `chat.createSession`, the iterator surfaces it as `turn.handover`; call `turn.complete()` with no argument on a final handover. The lower-level `chat.waitForHandover()` and `accumulator.applyHandover()` are also exported for hand-rolled loops. - Cache your chat agent's system prompt with Anthropic prompt caching. `chat.toStreamTextOptions()` now emits the system prompt as a cacheable message when you opt in, so a large, stable system block is billed at cache-read rates on every turn instead of full price. ([#3952](#3952)) ```ts // at the streamText call site (Anthropic sugar) streamText({ ...chat.toStreamTextOptions({ cacheControl: { type: "ephemeral" } }), messages, }); // provider-agnostic equivalent chat.toStreamTextOptions({ systemProviderOptions: { anthropic: { cacheControl: { type: "ephemeral" } } }, }); // or where the prompt is defined chat.prompt.set(SYSTEM_PROMPT, { providerOptions: { anthropic: { cacheControl: { type: "ephemeral" } } }, }); ``` Without an option, `system` stays a plain string. Pairs with a `prepareMessages` cache breakpoint to cache the conversation prefix across turns too. - Three fixes for custom agent loops (`chat.customAgent`, `chat.createSession`, and hand-rolled `MessageAccumulator` loops): ([#3936](#3936)) - Continuation runs no longer replay already-answered user messages into the first turn. The `.in` resume cursor is now seeded before any listener attaches (the same boot logic `chat.agent` uses), so a chat that continues after a cancel, crash, or upgrade only sees genuinely new messages. - Steering a hand-rolled loop mid-stream no longer wipes the in-flight assistant response. `chat.pipeAndCapture` now stamps a server-generated message id on the stream, so a `prepareStep` injection keeps the partial text instead of replacing the message. - Task-backed tools (`ai.toolExecute`) now work from custom agent loops: the parent's session is threaded to the child run, so child tasks can stream progress into the chat with `chat.stream.writer({ target: "root" })` instead of failing with "session handle is not initialized". <details> <summary>Raw changeset output</summary>⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ `main` is currently in **pre mode** so this branch has prereleases rather than normal releases. If you want to exit prereleases, run `changeset pre exit` on `main`.⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ # Releases ## @trigger.dev/build@4.5.0-rc.7 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.7` ## trigger.dev@4.5.0-rc.7 ### Patch Changes - `@trigger.dev/sdk` now bundles the Trigger.dev agent skills and a curated snapshot of the docs those skills reference. The skills that `trigger skills` installs into your coding agent read this content from node_modules, so the guidance your AI assistant follows is pinned to the SDK version installed in your project and stays current across upgrades instead of going stale until the next reinstall. ([#3937](#3937)) - Running a CLI command like `dev`, `deploy`, `preview`, or `update` before initializing a project no longer crashes with a raw `Cannot find matching package.json` stack trace. The CLI now detects the missing project and points you to `npx trigger.dev@latest init` instead. ([#3929](#3929)) - The agent skills installed by `trigger skills` are now namespaced with a `trigger-` prefix (e.g. `trigger-authoring-tasks`, `trigger-getting-started`) so they don't collide with unrelated skills in your coding agent's skills directory. Adds a `trigger-cost-savings` skill for auditing and reducing compute spend (right-sizing machines, `maxDuration`, batching, debounce), and `@trigger.dev/sdk` now bundles the full Trigger.dev documentation so your agent can read the complete, version-pinned reference directly from node_modules. ([#3970](#3970)) - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.7` - `@trigger.dev/build@4.5.0-rc.7` - `@trigger.dev/schema-to-json@4.5.0-rc.7` ## @trigger.dev/core@4.5.0-rc.7 ### Patch Changes - The run span API response now includes `cachedCost` and `cacheCreationCost` on the `ai` object, alongside the existing `inputCost` / `outputCost` / `totalCost`. `inputCost` reflects only the non-cached input, so these fields let you reconstruct the full cost breakdown for prompt-cached calls. ([#3958](#3958)) ## @trigger.dev/python@4.5.0-rc.7 ### Patch Changes - Updated dependencies: - `@trigger.dev/sdk@4.5.0-rc.7` - `@trigger.dev/core@4.5.0-rc.7` - `@trigger.dev/build@4.5.0-rc.7` ## @trigger.dev/react-hooks@4.5.0-rc.7 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.7` ## @trigger.dev/redis-worker@4.5.0-rc.7 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.7` ## @trigger.dev/rsc@4.5.0-rc.7 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.7` ## @trigger.dev/schema-to-json@4.5.0-rc.7 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.7` ## @trigger.dev/sdk@4.5.0-rc.7 ### Patch Changes - `@trigger.dev/sdk` now bundles the Trigger.dev agent skills and a curated snapshot of the docs those skills reference. The skills that `trigger skills` installs into your coding agent read this content from node_modules, so the guidance your AI assistant follows is pinned to the SDK version installed in your project and stays current across upgrades instead of going stale until the next reinstall. ([#3937](#3937)) - `chat.headStart` now works with the `chat.customAgent` and `chat.createSession` backends, not only `chat.agent`. The warm step-1 response hands over to your loop the same way it does for a managed agent. ([#3963](#3963)) In a `chat.customAgent` loop, consume the handover on turn 0: ```ts const conversation = new chat.MessageAccumulator(); const { isFinal, skipped } = await conversation.consumeHandover({ payload }); if (skipped) return; // warm handler aborted, so exit without a turn if (isFinal) { await chat.writeTurnComplete(); // step 1 is the response, no streamText } else { const result = streamText({ model, messages: conversation.modelMessages, tools }); // Pass originalMessages so the handed-over tool round merges into the // step-1 assistant instead of starting a new message. const response = await chat.pipeAndCapture(result, { originalMessages: conversation.uiMessages, }); if (response) await conversation.addResponse(response); } ``` With `chat.createSession`, the iterator surfaces it as `turn.handover`; call `turn.complete()` with no argument on a final handover. The lower-level `chat.waitForHandover()` and `accumulator.applyHandover()` are also exported for hand-rolled loops. - Add `triggerConfig` support to `chat.headStart()` and `chat.openSession()`, so the auto-triggered handover-prepare run inherits tags, queue, machine, and other session trigger options the same way `chat.createStartSessionAction()` does. The `chat:{chatId}` tag is prepended automatically. ([#3963](#3963)) ```ts export const POST = chat.headStart({ agentId: "my-agent", triggerConfig: { tags: ["org:acme"], queue: "chat" }, run: async ({ chat }) => streamText({ ...chat.toStreamTextOptions(), model }), }); ``` Because the session is created once on the first head-start turn and is idempotent on the chat id, this is the only place to set those options for a head-start chat's lifetime. `chat.createStartSessionAction()` now also forwards `maxDuration`, `region`, and `lockToVersion` so both session entry points stay consistent. - Cache your chat agent's system prompt with Anthropic prompt caching. `chat.toStreamTextOptions()` now emits the system prompt as a cacheable message when you opt in, so a large, stable system block is billed at cache-read rates on every turn instead of full price. ([#3952](#3952)) ```ts // at the streamText call site (Anthropic sugar) streamText({ ...chat.toStreamTextOptions({ cacheControl: { type: "ephemeral" } }), messages, }); // provider-agnostic equivalent chat.toStreamTextOptions({ systemProviderOptions: { anthropic: { cacheControl: { type: "ephemeral" } } }, }); // or where the prompt is defined chat.prompt.set(SYSTEM_PROMPT, { providerOptions: { anthropic: { cacheControl: { type: "ephemeral" } } }, }); ``` Without an option, `system` stays a plain string. Pairs with a `prepareMessages` cache breakpoint to cache the conversation prefix across turns too. - Three fixes for custom agent loops (`chat.customAgent`, `chat.createSession`, and hand-rolled `MessageAccumulator` loops): ([#3936](#3936)) - Continuation runs no longer replay already-answered user messages into the first turn. The `.in` resume cursor is now seeded before any listener attaches (the same boot logic `chat.agent` uses), so a chat that continues after a cancel, crash, or upgrade only sees genuinely new messages. - Steering a hand-rolled loop mid-stream no longer wipes the in-flight assistant response. `chat.pipeAndCapture` now stamps a server-generated message id on the stream, so a `prepareStep` injection keeps the partial text instead of replacing the message. - Task-backed tools (`ai.toolExecute`) now work from custom agent loops: the parent's session is threaded to the child run, so child tasks can stream progress into the chat with `chat.stream.writer({ target: "root" })` instead of failing with "session handle is not initialized". - The agent skills installed by `trigger skills` are now namespaced with a `trigger-` prefix (e.g. `trigger-authoring-tasks`, `trigger-getting-started`) so they don't collide with unrelated skills in your coding agent's skills directory. Adds a `trigger-cost-savings` skill for auditing and reducing compute spend (right-sizing machines, `maxDuration`, batching, debounce), and `@trigger.dev/sdk` now bundles the full Trigger.dev documentation so your agent can read the complete, version-pinned reference directly from node_modules. ([#3970](#3970)) - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.7` ## @trigger.dev/plugins@4.5.0-rc.7 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.7` </details> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Summary
The Models page is now split into two tabs. Your models shows the models your project has actually used in the selected time range, with usage charts (cost over time, tokens over time, calls by model), a per-model table of calls / cost / avg TTFC / avg tokens-per-sec, and calls/tokens trend sparklines. Model library is the full catalog, reordered from alphabetical to a relevance-based provider order (Anthropic, OpenAI, Google, then the rest), newest models first within each provider, with a "New" badge on models released in the last 7 days.
One time-range selector drives the whole Your models tab, so the charts, the table, and the sparklines all share the same window. Opening a model shows its own metrics with an independent range picker and a "View in AI metrics" link that opens the AI metrics dashboard filtered to that model. The active tab is kept in the URL so it survives a refresh and is shareable.
Prompt caching & cost accuracy
Both the Your models tab and the AI metrics dashboard now surface prompt-cache usage: a cache-savings column plus per-model cached-tokens and cache-hit-rate views, and a caching section on the dashboard (hit rate, cached tokens, estimated savings, and hit rate by model).
Building this surfaced a cost bug.
input_tokensis the total prompt count and already includes cache-read and cache-creation tokens, but the cost pipeline charged the full input at the input price and then added a separate cache line, so cached tokens were billed twice (and on Anthropic, cache reads were never discounted because their price is keyed differently). The input price now applies only to the non-cached remainder, with cache prices resolved across the provider-specific keys, so LLM cost and the cache hit-rate metric are accurate. Hit rate is computed as cached reads over total input.Notes
Also fixes React "invalid DOM property" console warnings from the provider icons (the Llama and DeepSeek SVGs used raw
fill-rule/clip-rule/clip-pathattributes), which this page surfaces by rendering more provider icons.Screenshots
Your models tab: usage charts and a per-model table with calls/tokens trend sparklines.
Model library: provider-relevance ordering with a "New" badge on models released in the last 7 days.
Model detail, Metrics tab: per-model range picker and a "View in AI metrics" link.
View in AI metrics: the dashboard deep-linked and filtered to the selected model.