Skip to content

feat(agent): add max_iterations cap to prevent unbounded tool-call recursion#2166

Open
jpr5 wants to merge 1 commit intostrands-agents:mainfrom
jpr5:fix/event-loop-max-iterations
Open

feat(agent): add max_iterations cap to prevent unbounded tool-call recursion#2166
jpr5 wants to merge 1 commit intostrands-agents:mainfrom
jpr5:fix/event-loop-max-iterations

Conversation

@jpr5
Copy link
Copy Markdown

@jpr5 jpr5 commented Apr 20, 2026

Agent.recurse_event_loop has no built-in cap on tool-call cycles per invocation. With any degenerate generator (fuzzy fixture matcher, sticky system prompt, provider misbehavior), the agent loops unboundedly: each turn emits the same tool call → tool result appended → model re-emits same call → repeat until process killed, socket times out, or rate limit breaks the chain.

Add Agent(max_iterations: int = 25) parameter (default per precedent: LangChain=15, OpenAI Agents SDK=10, strands.multiagent.Swarm=20). Rejects bool explicitly (isinstance(True, int) would silently cap at 1). Counter lives in invocation_state["event_loop_cycle_count"] and is reset at the top of each _run_loop iteration so hook-driven resume legs get a fresh budget, and again in the ContextWindowOverflow recovery path so reducer retries don't consume tool-call budget.

Counter is only incremented when the model is actually invoked; interrupt-resume and tool-use-replay paths (_has_tool_use_in_latest_message) do not consume budget. On cap trip, _handle_tool_execution logs a warning, appends a synthetic assistant message with {"synthetic": True} marker in metadata so downstream analytics can distinguish a halt zero from a real zero-token call, fires MessageAddedEvent but NOT ModelMessageEvent (no model call occurred), and yields EventLoopStopEvent with stop_reason="max_iterations".

Introduces MAX_ITERATIONS_STOP_REASON: Final constant in types/event_loop.py (StopReason Literal must stay in sync with it).

Tests cover termination after exactly N cycles, stop_reason surfacing, synthetic message shape + metadata marker, warning log with exact substring, max_iterations=1 boundary, ValueError coverage for 0 / negative / "5" / 2.5 / True / False, counter reset between invocations with a shared invocation_state dict, counter reset on ContextWindowOverflow retry, counter reset per hook-resume leg, and no ModelMessageEvent emission on halt.

Downstream workaround in CopilotKit PR #4083 (commit 9227bc27d, showcase/packages/strands/src/agents/agent.py) used a HookProvider that cancelled tool calls and set stop_event_loop after 8 calls; this removes the need for that workaround.

…cursion

Agent.recurse_event_loop has no built-in cap on tool-call cycles per
invocation. With any degenerate generator (fuzzy fixture matcher,
sticky system prompt, provider misbehavior), the agent loops
unboundedly: each turn emits the same tool call → tool result
appended → model re-emits same call → repeat until process killed,
socket times out, or rate limit breaks the chain.

Add Agent(max_iterations: int = 25) parameter (default per precedent:
LangChain=15, OpenAI Agents SDK=10, strands.multiagent.Swarm=20).
Rejects bool explicitly (isinstance(True, int) would silently cap at
1). Counter lives in invocation_state["event_loop_cycle_count"] and
is reset at the top of each _run_loop iteration so hook-driven resume
legs get a fresh budget, and again in the ContextWindowOverflow
recovery path so reducer retries don't consume tool-call budget.

Counter is only incremented when the model is actually invoked;
interrupt-resume and tool-use-replay paths (_has_tool_use_in_latest_
message) do not consume budget. On cap trip, _handle_tool_execution
logs a warning, appends a synthetic assistant message with
{"synthetic": True} marker in metadata so downstream analytics can
distinguish a halt zero from a real zero-token call, fires
MessageAddedEvent but NOT ModelMessageEvent (no model call occurred),
and yields EventLoopStopEvent with stop_reason="max_iterations".

Introduces MAX_ITERATIONS_STOP_REASON: Final constant in
types/event_loop.py (StopReason Literal must stay in sync with it).

Tests cover termination after exactly N cycles, stop_reason surfacing,
synthetic message shape + metadata marker, warning log with exact
substring, max_iterations=1 boundary, ValueError coverage for 0 /
negative / "5" / 2.5 / True / False, counter reset between invocations
with a shared invocation_state dict, counter reset on ContextWindow
Overflow retry, counter reset per hook-resume leg, and no
ModelMessageEvent emission on halt.

Downstream workaround in CopilotKit PR #4083 (commit 9227bc27d,
showcase/packages/strands/src/agents/agent.py) used a HookProvider
that cancelled tool calls and set stop_event_loop after 8 calls; this
removes the need for that workaround.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant