Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/approval-gate-final-response.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"@openrouter/agent": patch
---

Run the approval check before executing pending tool calls on the `allowFinalResponse` path, and evaluate function-form `requireApproval` predicates against schema-normalized arguments so the approval decision matches the executed input.
32 changes: 26 additions & 6 deletions packages/agent/src/lib/conversation-state.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import type * as models from '@openrouter/sdk/models';
import * as z4 from 'zod/v4';
import type {
ConversationState,
ParsedToolCall,
Expand Down Expand Up @@ -104,6 +105,21 @@ export function appendToMessages(
];
}

function normalizeToolCallArguments<
TTool extends Exclude<
Tool,
{
_brand: 'server-tool';
}
>,
>(toolCall: ParsedToolCall<TTool>, tool: TTool): ParsedToolCall<TTool> {
const parsed = z4.safeParse(tool.function.inputSchema, toolCall.arguments);
if (parsed.success) {
toolCall.arguments = parsed.data as ParsedToolCall<TTool>['arguments'];
}
return toolCall;
}

/**
* Check if a tool call requires approval
* @param toolCall - The tool call to check
Expand All @@ -120,12 +136,6 @@ export async function toolRequiresApproval<TTools extends readonly Tool[]>(
context: TurnContext,
) => boolean | Promise<boolean>,
): Promise<boolean> {
// Call-level check takes precedence
if (callLevelCheck) {
return callLevelCheck(toolCall, context);
}

// Fall back to tool-level setting (server tools never require approval)
const tool = tools.find(
(
t,
Expand All @@ -136,6 +146,16 @@ export async function toolRequiresApproval<TTools extends readonly Tool[]>(
}
> => isClientTool(t) && t.function.name === toolCall.name,
);
if (tool) {
normalizeToolCallArguments(toolCall as ParsedToolCall<typeof tool>, tool);
}

// Call-level check takes precedence
if (callLevelCheck) {
return callLevelCheck(toolCall, context);
}

// Fall back to tool-level setting (server tools never require approval)
if (!tool) {
return false;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[suggestion] conversation-state.ts:165 — comment overstates the always-a-record guarantee after normalizeToolCallArguments

Details

Why: The comment says "Arguments have already been parsed and validated against the tool's Zod inputSchema (a ZodObject), so the runtime shape is always a record here." Before this PR that was true because execute() ran z4.parse (which throws on failure) before toolRequiresApproval was ever called. Now normalizeToolCallArguments uses z4.safeParse, which silently preserves the original arguments on failure. If the model emits arguments that don't satisfy the schema, toolCall.arguments at this point could still be the raw model output — not necessarily an object. The isRecord guard immediately below catches this, so correctness is unaffected, but the comment's "always" claim is now misleading to future readers.

Fix: Soften the comment to reflect the new reality, e.g.:

// Arguments are normalized via normalizeToolCallArguments above; if
// normalization succeeded the value is schema-parsed. The isRecord guard
// below catches any remaining non-object shapes (e.g. schema mismatch).
Prompt for agents
In `packages/agent/src/lib/conversation-state.ts`, update the comment block at lines 165–168 (the one reading "Arguments have already been parsed and validated against the tool's Zod inputSchema..."). This PR introduced `normalizeToolCallArguments`, which uses `z4.safeParse` (non-throwing). If safeParse fails the original arguments are preserved unchanged, so the "always a record" claim is no longer guaranteed. Replace the comment with something like: "Arguments are normalized via normalizeToolCallArguments above; if normalization succeeded the value is schema-parsed. The isRecord guard below catches any remaining non-object shapes."

Reviewed at d3569e7

Expand Down
4 changes: 4 additions & 0 deletions packages/agent/src/lib/model-result.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2010,6 +2010,10 @@ export class ModelResult<
numberOfTurns: turnNumber,
};

if (await this.handleApprovalCheck(pendingToolCalls, turnNumber, currentResponse)) {
return;
}

await this.options.onTurnStart?.(turnContext);
await this.resolveAsyncFunctionsForTurn(turnContext);

Expand Down
55 changes: 55 additions & 0 deletions packages/agent/tests/unit/allow-final-response.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -323,6 +323,61 @@ describe('allowFinalResponse', () => {
expect(fnCallOutput?.output).toContain('"temperature":99');
});

it('pauses for approval before executing final-response tool calls', async () => {
const executeSpy = vi.fn(async (_p: { location: string }) => ({
temperature: 99,
}));
const approvalTool = {
type: ToolType.Function,
function: {
name: 'get_weather',
description: 'Get the weather for a location.',
inputSchema: z.object({
location: z.string(),
}),
outputSchema: z.object({
temperature: z.number(),
}),
requireApproval: true,
execute: executeSpy,
},
} as const;
const saved: Array<{
status?: string;
pendingToolCalls?: unknown[];
}> = [];
const stateAccessor = {
load: async () => null,
save: async (state: { status?: string; pendingToolCalls?: unknown[] }) => {
saved.push(state);
},
};

mockBetaResponsesSend.mockResolvedValueOnce({
ok: true,
value: toolCallResponse(),
});

const result = callModel(client, {
model: 'test-model',
input: 'What is the weather?',
tools: [
approvalTool,
] as const,
stopWhen: stepCountIs(0),
allowFinalResponse: true,
state: stateAccessor as unknown as Parameters<typeof callModel>[1]['state'],
});

const pending = await result.getPendingToolCalls();

expect(pending).toHaveLength(1);
expect(pending[0]?.name).toBe('get_weather');
expect(executeSpy).not.toHaveBeenCalled();
expect(mockBetaResponsesSend).toHaveBeenCalledTimes(1);
expect(saved.some((state) => state.status === 'awaiting_approval')).toBe(true);
});

it('does not trigger when allowFinalResponse is false', async () => {
mockBetaResponsesSend.mockResolvedValueOnce({
ok: true,
Expand Down
30 changes: 30 additions & 0 deletions packages/agent/tests/unit/conversation-state.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -308,6 +308,36 @@ describe('Conversation State Utilities', () => {
).toBe(true);
});

it('should apply schema defaults before function-based approval checks', async () => {
const toolWithDefaultedApproval = tool({
name: 'defaulted_action',
inputSchema: z.object({
destructive: z.boolean().default(true),
}),
requireApproval: (params) => params.destructive === true,
execute: async () => ({}),
});

const toolCall = {
id: '1',
name: 'defaulted_action',
arguments: {},
};

expect(
await toolRequiresApproval(
toolCall,
[
toolWithDefaultedApproval,
],
context,
),
).toBe(true);
expect(toolCall.arguments).toEqual({
destructive: true,
});
});

it('should support async function-based tool-level requireApproval', async () => {
// Tool with async function-based approval
const toolWithAsyncApproval = tool({

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[suggestion] conversation-state.test.ts:342 — no test for callLevelCheck (call-level requireApproval) receiving schema-normalized arguments

Details

Why: This PR moves normalizeToolCallArguments to run before both the callLevelCheck early-return and the tool-level requireApproval function. The new test at line 308 validates the tool-level path. The callLevelCheck path (the fourth callLevelCheck? parameter, which maps to the user-supplied requireApproval option on callModel) also now receives schema-normalized args when a matching tool is found — but there is no test for this behavior. A caller who passes a requireApproval function to callModel expecting to see raw model arguments would silently receive normalized ones instead.

Fix: Add a test case in the callLevelCheck section near it('should support tool-level requireApproval', ...) that verifies the call-level predicate receives schema defaults:

it('should apply schema defaults before call-level requireApproval', async () => {
  const tool = ...({ inputSchema: z.object({ flag: z.boolean().default(true) }) });
  const seenArgs: unknown[] = [];
  const toolCall = { id: '1', name: 'x', arguments: {} };
  await toolRequiresApproval(toolCall, [tool], context, (tc) => {
    seenArgs.push(tc.arguments);
    return false;
  });
  expect(seenArgs[0]).toEqual({ flag: true }); // schema default applied
});
Prompt for agents
In `packages/agent/tests/unit/conversation-state.test.ts`, add a test after the new `should apply schema defaults before function-based approval checks` test (around line 342) that verifies the `callLevelCheck` path also receives schema-normalized arguments. Use a tool with `inputSchema: z.object({ flag: z.boolean().default(true) })`, pass a `callLevelCheck` function that records its received `toolCall.arguments`, call `toolRequiresApproval(toolCall, [tool], context, callLevelCheck)` with an empty-object toolCall, and assert the recorded arguments equal `{ flag: true }`. This documents that call-level predicates receive schema-normalized args (not raw model output), which is a new behavioral guarantee introduced by this PR.

Reviewed at d3569e7

Expand Down