Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .changeset/openrouter-video-adapter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
'@tanstack/ai-openrouter': minor
---

Add `openRouterVideo`, a video generation adapter for OpenRouter's dedicated async API (`POST /api/v1/videos`) — Seedance, Veo 3.1, Wan, Kling, and Sora 2 Pro through one API key. Follows the jobs/polling architecture (`generateVideo()` → `getVideoJobStatus()`), with per-model `size` / `duration` / provider-option types generated from OpenRouter's `GET /api/v1/videos/models` metadata and validated before submit. `duration` is typed per model on the shared typed-duration contract — the adapter implements `availableDurations()` and `snapDuration(seconds)` (matching the Veo adapter) to enumerate the valid set and coerce raw UI seconds to the closest supported value. Image-conditioned prompts map `metadata.role` onto the wire: `start_frame` / `end_frame` → `frame_images[]` (`first_frame` / `last_frame`), `reference` / `character` → `input_references[]`; frame roles are validated against each model's `supported_frame_images`. Completed videos are downloaded server-side and returned as `data:` URLs (OpenRouter's download URLs require the API key), and the gateway-reported cost is surfaced as `usage.cost`.

Image adapter fixes from the #624 review: requested `size` is now validated (the `WIDTHxHEIGHT` union previously used a Unicode `×`, so every size except `1024x1024` silently dropped its aspect ratio; unsupported sizes now throw with the supported list), `numberOfImages > 1` throws instead of silently returning one image (verified live: the gateway ignores all count keys in `image_config`), and `image_config.strength` (0.0–1.0 image-to-image influence) is exposed via `modelOptions.strength`.
5 changes: 5 additions & 0 deletions .changeset/video-adapter-duration-constraint.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
'@tanstack/ai': patch
---

Fix `generateVideo()` (and the other `generateVideo` activity entry points) rejecting video adapters that declare per-model typed durations. The activity's `TAdapter extends VideoAdapter<string, any, any, any>` bound let the sixth `TModelDurationByName` generic fall back to its `Record<string, number>` default; because `createVideoJob` is a contravariant function-valued property, a concrete adapter whose `duration` is narrowed to a literal union (e.g. Veo's `4 | 6 | 8`, OpenRouter Seedance's `4..15`) failed the bound, so the documented `generateVideo({ adapter: geminiVideo('veo-3.1-generate-preview'), ... })` pattern did not type-check. The constraint now leaves the size and duration generics unpinned (`VideoAdapter<string, any, any, any, any, any>`); the real per-model types are still recovered by inference (`VideoSizeForAdapter` / `VideoDurationForAdapter`).
87 changes: 87 additions & 0 deletions docs/adapters/openrouter.md
Original file line number Diff line number Diff line change
Expand Up @@ -247,6 +247,93 @@ fields are simply absent and the stream completes normally. Both
`openRouterText` and `openRouterResponsesText` populate cost when OpenRouter
returns it.

## Image Generation

`openRouterImage` routes image generation through OpenRouter's
chat-completions surface (`modalities: ['image']`). Multimodal prompts are
supported — text and image parts are forwarded in order for
image-conditioned generation:

```typescript
import { generateImage } from "@tanstack/ai";
import { openRouterImage } from "@tanstack/ai-openrouter";

const result = await generateImage({
adapter: openRouterImage("google/gemini-2.5-flash-image"),
prompt: "A watercolor lighthouse at dusk",
size: "1344x768", // mapped to image_config.aspect_ratio ('16:9')
modelOptions: {
image_size: "2K", // resolution (Gemini models)
strength: 0.35, // image-to-image influence, i2i-capable models only
},
});
```

Notes:

- The pathway returns **exactly one image per request** — `numberOfImages > 1`
throws instead of silently under-delivering. Make multiple requests if you
need multiple candidates.
- `size` must be one of the ten supported `WIDTHxHEIGHT` values (it is
converted to `image_config.aspect_ratio`); anything else throws with the
supported list.

## Video Generation (Experimental)

`openRouterVideo` targets OpenRouter's dedicated **async video API**
(`POST /api/v1/videos`) — Seedance, Veo 3.1, Wan, Kling, and Sora 2 Pro
through your one OpenRouter key. It follows the jobs/polling architecture
shared by all TanStack AI video adapters:

```typescript
// Server: create the job, then poll
import { generateVideo, getVideoJobStatus } from "@tanstack/ai";
import { openRouterVideo } from "@tanstack/ai-openrouter";

const adapter = openRouterVideo("bytedance/seedance-2.0");

const { jobId } = await generateVideo({
adapter,
prompt: [
{ type: "text", content: "Animate this product shot, slow push-in" },
{
type: "image",
source: { type: "url", value: "https://your-cdn.com/product.png" },
metadata: { role: "start_frame" },
},
],
size: "1280x720",
// `duration` is typed per model from the published metadata; coerce raw
// seconds with adapter.snapDuration() or enumerate via adapter.availableDurations().
duration: 8,
});

let status = await getVideoJobStatus({ adapter, jobId });
while (status.status !== "completed" && status.status !== "failed") {
await new Promise((r) => setTimeout(r, 5000));
status = await getVideoJobStatus({ adapter, jobId });
}
// status.url is a data: URL (OpenRouter download URLs require the API key,
// so the adapter downloads server-side); status.usage?.cost is the real
// billed cost reported by the gateway.
```

```tsx
// Client: track the job with the useGenerateVideo hook
import { useGenerateVideo, fetchServerSentEvents } from "@tanstack/ai-react";

const { generate, result, videoStatus, isLoading } = useGenerateVideo({
connection: fetchServerSentEvents("/api/generate/video"),
});
// result?.url renders directly: <video src={result.url} controls />
```

Sizes, durations, and per-model options (`resolution`, `aspectRatio`,
`generateAudio`, `seed`, …) are typed and validated per model from
OpenRouter's video model metadata. See
[Video Generation](../media/video-generation.md) for the full lifecycle,
streaming mode, and the image-to-video role-mapping table.

## Next Steps

- [Getting Started](../getting-started/quick-start) - Learn the basics
Expand Down
7 changes: 4 additions & 3 deletions docs/config.json
Original file line number Diff line number Diff line change
Expand Up @@ -256,13 +256,13 @@
"label": "Image Generation",
"to": "media/image-generation",
"addedAt": "2026-04-15",
"updatedAt": "2026-06-08"
"updatedAt": "2026-06-10"
},
Comment on lines +259 to 260

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win

Refresh media/image-generation updatedAt to this PR date.

docs/media/image-generation.md is changed in this PR, but its entry still shows "updatedAt": "2026-06-10" instead of today (2026-06-24), so docs freshness metadata is inconsistent.

As per coding guidelines, “Update updatedAt timestamp in docs/config.json when making content changes to a documentation page.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/config.json` around lines 259 - 260, The docs freshness metadata for the
media/image-generation page is stale because the config entry still uses an
older updatedAt value even though docs/media/image-generation.md was modified in
this PR. Update the corresponding entry in docs/config.json for
media/image-generation so its updatedAt matches the PR date, using the existing
docs metadata entry structure as the anchor.

Source: Coding guidelines

{
"label": "Video Generation",
"to": "media/video-generation",
"addedAt": "2026-04-15",
"updatedAt": "2026-06-08"
"updatedAt": "2026-06-24"
},
{
"label": "Generation Hooks",
Expand Down Expand Up @@ -454,7 +454,8 @@
{
"label": "OpenRouter Adapter",
"to": "adapters/openrouter",
"addedAt": "2026-04-15"
"addedAt": "2026-04-15",
"updatedAt": "2026-06-24"
},
{
"label": "OpenAI-Compatible",
Expand Down
2 changes: 1 addition & 1 deletion docs/media/image-generation.md
Original file line number Diff line number Diff line change
Expand Up @@ -301,7 +301,7 @@ await generateImage({
| **Gemini** | Native models (`gemini-*-flash-image`, "nano-banana", etc.) → prompt parts map 1:1 onto multimodal `contents`, preserving interleaved order. Up to ~14 input images (provider limit, not enforced by the SDK).<br>Imagen models → throws (text-to-image only). |
| **fal.ai** | Field names resolve per endpoint from a map generated from the fal SDK's endpoint types (e.g. nano-banana edit gets `image_urls`, Fooocus masks get `mask_image_url`). Defaults for unknown endpoints: 1 input → `image_url`; multiple → `image_urls`; `role: 'mask'` → `mask_url`; `role: 'control'` → `control_image_url`; `role: 'reference'` / `'character'` → `reference_image_urls`. Override with `modelOptions` for endpoint-specific fields. |
| **Grok** | grok-imagine models → xAI's `/v1/images/edits` (up to 3 source images, addressed by xAI in request order; prompt sent verbatim). `role: 'mask'` / `'control'` throw (no Imagine API equivalent). `grok-2-image-1212` throws (text-to-image only). |
| **OpenRouter** | Prompt parts map 1:1 onto multimodal `image_url` / `text` content parts, preserving interleaved order, and are forwarded to the underlying image model. |
| **OpenRouter** | Prompt parts map 1:1 onto multimodal `image_url` / `text` content parts, preserving interleaved order, and are forwarded to the underlying image model. `modelOptions.strength` (0.0–1.0) controls image-to-image influence on models that document it (e.g. Recraft). One image per request — `numberOfImages > 1` throws (the gateway ignores count keys). |
| **Anthropic** | n/a — no image generation API. |

Adapters that don't support image-conditioned generation throw a clear
Expand Down
89 changes: 80 additions & 9 deletions docs/media/video-generation.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,21 @@
title: Video Generation
id: video-generation
order: 6
description: "Generate video from text prompts with OpenAI Sora or Google Veo using TanStack AI's experimental generateVideo() jobs/polling API."
description: "Generate video from text prompts with OpenAI Sora, Google Veo, fal.ai, or OpenRouter (Seedance, Veo, Wan) using TanStack AI's experimental generateVideo() jobs/polling API."
keywords:
- tanstack ai
- video generation
- sora
- veo
- gemini
- openrouter
- seedance
- fal
- generateVideo
- jobs api
- experimental
- text-to-video
- image-to-video
---

# Video Generation (Experimental)
Expand All @@ -39,6 +43,8 @@ TanStack AI provides experimental support for video generation through dedicated
Currently supported:
- **OpenAI**: Sora-2 and Sora-2-Pro models (when available)
- **Google Gemini**: Veo 3.1, Veo 3, and Veo 2 models (via the long-running operations API)
- **fal.ai**: Kling, MiniMax, Hunyuan, and other fal-hosted video endpoints
- **OpenRouter**: Seedance, Veo 3.1, Wan, Kling, Sora 2 Pro and others via the dedicated async video API (`POST /api/v1/videos`)

## Basic Usage

Expand Down Expand Up @@ -427,12 +433,12 @@ for the per-provider table.
Each `ImagePart` can carry an optional `metadata.role` hint that the
adapter uses to route the input to the provider-specific field:

| Role | Maps to |
| --------------- | ------------------------------------------------------------- |
| `'start_frame'` | fal `start_image_url`, Veo input `image` (positional default for the first input) |
| `'end_frame'` | fal `end_image_url`, Veo `lastFrame` |
| `'reference'` | fal `reference_image_urls`, Veo `referenceImages` |
| `'character'` | Same as `'reference'` — character consistency images |
| Role | Maps to |
| --------------- | --------------------------------------------------------------------------------------------------------- |
| `'start_frame'` | fal `start_image_url`; Veo input `image`; OpenRouter `frame_images[]` with `frame_type: 'first_frame'` (positional default for the first input) |
| `'end_frame'` | fal `end_image_url`; Veo `lastFrame`; OpenRouter `frame_images[]` with `frame_type: 'last_frame'` |
| `'reference'` | fal `reference_image_urls`; Veo `referenceImages`; OpenRouter `input_references[]` |
| `'character'` | Same as `'reference'` — character consistency images |

```typescript
import { generateVideo } from '@tanstack/ai'
Expand Down Expand Up @@ -460,6 +466,7 @@ await generateVideo({
| **OpenAI** | Sora-2 / Sora-2-Pro → the image part goes to `input_reference`; flattened text is the prompt. Single image only — throws if more than one. |
| **fal.ai** | Field names resolve per endpoint from a map generated from the fal SDK's endpoint types — e.g. `role: 'start_frame'` lands on `image_url` for Kling/Veo image-to-video, `first_frame_url` for first-last-frame endpoints, and `start_image_url` otherwise. Defaults: single input → `image_url` (start frame); `role: 'end_frame'` → `end_image_url`; `role: 'reference'` / `'character'` → `reference_image_urls`. Override per-endpoint via `modelOptions` — the media-conditioning fields are typed optional there (even when the endpoint requires them) since they usually arrive as prompt parts. |
| **Gemini** | Veo → the first un-roled / `'start_frame'` image becomes the input image; `'end_frame'` → `lastFrame`; `'reference'` / `'character'` → `referenceImages` (asset references, Veo 3.1). Throws on multiple starting images. |
| **OpenRouter** | `role: 'start_frame'` / `'end_frame'` → `frame_images[]` with `frame_type: 'first_frame'` / `'last_frame'`; `role: 'reference'` / `'character'` → `input_references[]`; an unroled image defaults to the start frame. At most one start and one end frame; frame roles are validated against the model's `supported_frame_images` metadata (e.g. Hailuo only takes a first frame). When both frame images and references are present, OpenRouter treats the request as image-to-video and references take lower priority. URL image sources pass through verbatim and `data` sources become data URIs — OpenRouter does not fetch URLs behind redirects or bot checks, so use directly accessible URLs. |

Adapters whose underlying API can't accept image inputs throw a clear
runtime error so calls fail fast.
Expand Down Expand Up @@ -567,6 +574,68 @@ Adapters that haven't declared a per-model duration map keep the plain
> Files API and requires your API key to download (send it as an
> `x-goog-api-key` header or `key` query parameter).

### OpenRouter Model Options

OpenRouter's [video generation API](https://openrouter.ai/docs/guides/overview/multimodal/video-generation)
runs Seedance, Veo, Wan, Kling, Sora 2 Pro and others behind one async jobs
API. `size`, `duration`, and the per-model options below are typed **and
validated per model** from OpenRouter's published model capabilities (a size
or duration the model doesn't support throws before the request is sent):

```typescript
import { generateVideo } from '@tanstack/ai'
import { openRouterVideo } from '@tanstack/ai-openrouter'

const { jobId } = await generateVideo({
adapter: openRouterVideo('bytedance/seedance-2.0'),
prompt: 'A beautiful sunset over the ocean',
size: '1280x720', // per-model union from OpenRouter's model metadata
duration: 8, // validated against the model's supported durations
modelOptions: {
resolution: '720p', // alternative to size: resolution + aspectRatio
aspectRatio: '16:9',
generateAudio: true, // omitted from the type for models that can't
seed: 42, // omitted from the type for models that can't
callbackUrl: 'https://your-app.com/webhooks/openrouter-video',
provider: { options: { bytedance: { watermark: false } } }, // passthrough
},
})
```

Like the Veo adapter, OpenRouter's `duration` is **typed per model** — each
model narrows `duration` to the whole-second union published in its metadata,
and the adapter implements the same `availableDurations()` / `snapDuration()`
introspection helpers:

```typescript
import { generateVideo } from '@tanstack/ai'
import { openRouterVideo } from '@tanstack/ai-openrouter'

const adapter = openRouterVideo('bytedance/seedance-2.0')

adapter.availableDurations()
// { kind: 'discrete', values: [4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] }
adapter.snapDuration(7.4) // 7 — closest valid duration

const sliderSeconds = 7 // raw seconds from a UI control
await generateVideo({
adapter,
prompt: 'A timelapse of clouds',
duration: adapter.snapDuration(sliderSeconds), // coerce to a valid duration
})
```

Two OpenRouter-specific behaviors to know about:

- **The completed video arrives as a `data:` URL.** OpenRouter's download
URLs require your API key in an `Authorization` header, so the adapter
downloads the content server-side and returns a base64 data URL that can
be handed straight to a `<video>` tag. Videos over ~10 MiB log a warning —
prefer re-uploading to your own storage/CDN over passing large data URLs
around.
- **Cost is reported on completion.** The gateway reports the real billed
cost for the job; it's surfaced as `usage.cost` on the completed result.

## Response Types

> **Note:** The interfaces below are the underlying adapter-level types. The `getVideoJobStatus()` helper returns a single merged object, `{ status, progress?, url?, error?, usage? }` — it does not return `jobId` or `expiresAt`.
Expand Down Expand Up @@ -675,8 +744,10 @@ Check the [OpenAI documentation](https://platform.openai.com/docs) for current l
The video adapters use the same environment variables as the other adapters
for their provider:

- `OPENAI_API_KEY`: Your OpenAI API key (Sora)
- `GOOGLE_API_KEY` or `GEMINI_API_KEY`: Your Google API key (Veo)
- `OPENAI_API_KEY`: Your OpenAI API key (`openaiVideo`, Sora)
- `GOOGLE_API_KEY` or `GEMINI_API_KEY`: Your Google API key (`geminiVideo`, Veo)
- `OPENROUTER_API_KEY`: Your OpenRouter API key (`openRouterVideo`)
- `FAL_KEY`: Your fal.ai API key (`falVideo`)

## Explicit API Keys

Expand Down
1 change: 1 addition & 0 deletions examples/ts-react-media/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
"@tanstack/ai": "workspace:*",
"@tanstack/ai-fal": "workspace:*",
"@tanstack/ai-gemini": "workspace:*",
"@tanstack/ai-openrouter": "workspace:*",
"@tanstack/react-router": "^1.158.4",
"@tanstack/react-start": "^1.159.0",
"@tanstack/router-plugin": "^1.158.4",
Expand Down
14 changes: 14 additions & 0 deletions examples/ts-react-media/src/lib/models.ts
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,20 @@ export const VIDEO_MODELS = [
description: 'Fast image-to-video animation',
mode: 'image-to-video' as const,
},
{
id: 'bytedance/seedance-2.0',
name: 'Seedance 2.0 (Text-to-Video, OpenRouter)',
description:
"OpenRouter's async video API; duration typed 4–15s with snapDuration()",
mode: 'text-to-video' as const,
},
{
id: 'google/veo-3.1',
name: 'Veo 3.1 (Image-to-Video, OpenRouter)',
description:
'OpenRouter async video; duration snaps to the nearest of 4/6/8s',
mode: 'image-to-video' as const,
},
] as const

export type ImageModel = (typeof IMAGE_MODELS)[number]
Expand Down
Loading
Loading