feat(agent): add max_iterations cap to prevent unbounded tool-call recursion#2166
Open
jpr5 wants to merge 1 commit intostrands-agents:mainfrom
Open
feat(agent): add max_iterations cap to prevent unbounded tool-call recursion#2166jpr5 wants to merge 1 commit intostrands-agents:mainfrom
jpr5 wants to merge 1 commit intostrands-agents:mainfrom
Conversation
…cursion
Agent.recurse_event_loop has no built-in cap on tool-call cycles per
invocation. With any degenerate generator (fuzzy fixture matcher,
sticky system prompt, provider misbehavior), the agent loops
unboundedly: each turn emits the same tool call → tool result
appended → model re-emits same call → repeat until process killed,
socket times out, or rate limit breaks the chain.
Add Agent(max_iterations: int = 25) parameter (default per precedent:
LangChain=15, OpenAI Agents SDK=10, strands.multiagent.Swarm=20).
Rejects bool explicitly (isinstance(True, int) would silently cap at
1). Counter lives in invocation_state["event_loop_cycle_count"] and
is reset at the top of each _run_loop iteration so hook-driven resume
legs get a fresh budget, and again in the ContextWindowOverflow
recovery path so reducer retries don't consume tool-call budget.
Counter is only incremented when the model is actually invoked;
interrupt-resume and tool-use-replay paths (_has_tool_use_in_latest_
message) do not consume budget. On cap trip, _handle_tool_execution
logs a warning, appends a synthetic assistant message with
{"synthetic": True} marker in metadata so downstream analytics can
distinguish a halt zero from a real zero-token call, fires
MessageAddedEvent but NOT ModelMessageEvent (no model call occurred),
and yields EventLoopStopEvent with stop_reason="max_iterations".
Introduces MAX_ITERATIONS_STOP_REASON: Final constant in
types/event_loop.py (StopReason Literal must stay in sync with it).
Tests cover termination after exactly N cycles, stop_reason surfacing,
synthetic message shape + metadata marker, warning log with exact
substring, max_iterations=1 boundary, ValueError coverage for 0 /
negative / "5" / 2.5 / True / False, counter reset between invocations
with a shared invocation_state dict, counter reset on ContextWindow
Overflow retry, counter reset per hook-resume leg, and no
ModelMessageEvent emission on halt.
Downstream workaround in CopilotKit PR #4083 (commit 9227bc27d,
showcase/packages/strands/src/agents/agent.py) used a HookProvider
that cancelled tool calls and set stop_event_loop after 8 calls; this
removes the need for that workaround.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Agent.recurse_event_loop has no built-in cap on tool-call cycles per invocation. With any degenerate generator (fuzzy fixture matcher, sticky system prompt, provider misbehavior), the agent loops unboundedly: each turn emits the same tool call → tool result appended → model re-emits same call → repeat until process killed, socket times out, or rate limit breaks the chain.
Add Agent(max_iterations: int = 25) parameter (default per precedent: LangChain=15, OpenAI Agents SDK=10, strands.multiagent.Swarm=20). Rejects bool explicitly (isinstance(True, int) would silently cap at 1). Counter lives in invocation_state["event_loop_cycle_count"] and is reset at the top of each _run_loop iteration so hook-driven resume legs get a fresh budget, and again in the ContextWindowOverflow recovery path so reducer retries don't consume tool-call budget.
Counter is only incremented when the model is actually invoked; interrupt-resume and tool-use-replay paths (_has_tool_use_in_latest_message) do not consume budget. On cap trip, _handle_tool_execution logs a warning, appends a synthetic assistant message with {"synthetic": True} marker in metadata so downstream analytics can distinguish a halt zero from a real zero-token call, fires MessageAddedEvent but NOT ModelMessageEvent (no model call occurred), and yields EventLoopStopEvent with stop_reason="max_iterations".
Introduces MAX_ITERATIONS_STOP_REASON: Final constant in types/event_loop.py (StopReason Literal must stay in sync with it).
Tests cover termination after exactly N cycles, stop_reason surfacing, synthetic message shape + metadata marker, warning log with exact substring, max_iterations=1 boundary, ValueError coverage for 0 / negative / "5" / 2.5 / True / False, counter reset between invocations with a shared invocation_state dict, counter reset on ContextWindowOverflow retry, counter reset per hook-resume leg, and no ModelMessageEvent emission on halt.
Downstream workaround in CopilotKit PR #4083 (commit 9227bc27d, showcase/packages/strands/src/agents/agent.py) used a HookProvider that cancelled tool calls and set stop_event_loop after 8 calls; this removes the need for that workaround.