Skip to content

feat: large tool result offload#2162

Merged
lizradway merged 22 commits intostrands-agents:mainfrom
lizradway:externalization
Apr 24, 2026
Merged

feat: large tool result offload#2162
lizradway merged 22 commits intostrands-agents:mainfrom
lizradway:externalization

Conversation

@lizradway
Copy link
Copy Markdown
Member

@lizradway lizradway commented Apr 20, 2026

Description

Adds a ContextOffloader plugin that proactively intercepts oversized tool results via AfterToolCallEvent, persists each content block individually to a pluggable storage backend, and replaces the in-context result with a truncated preview plus per-block references. Includes an optional built-in retrieval tool (disabled by default) so the agent can fetch offloaded content on demand, returning each content type in its native format.

Problem: When tool outputs exceed the model's context capacity, SlidingWindowConversationManager reactively truncates to first/last 200 chars — losing data permanently and wasting a failed API call.

Solution: A Plugin (following the AgentSkills pattern) that operates at tool execution time, before the result enters the conversation. Each content block is stored individually with its content type preserved, enabling type-aware retrieval. Inline guidance in each offloaded result tells the agent to use the preview when possible and to use its available tools to selectively access the data it needs.

Token-based thresholds: Uses the agent's model.count_tokens() for accurate token estimation (tiktoken when available, chars/4 heuristic fallback). The async hook wraps the tool result as a message for counting. Preview slicing uses tiktoken for exact token-level cuts when available, falls back to tokens * 4 chars.

Content type handling:

Type Behavior
Text Stored as text/plain, replaced with a preview
JSON Stored as application/json (serialized via json.dumps), replaced with a preview
Image Stored in native format (e.g., image/png), replaced with placeholder + reference
Document Stored in native format (e.g., application/pdf), replaced with placeholder + reference
Unknown Passed through unchanged

Storage backends (required — user must choose one):

  • InMemoryStorage — no filesystem side effects, content cleared on process exit. clear() method for manual cleanup.
  • FileStorage — persists to disk with .metadata.json sidecar for content type tracking across process restarts
  • S3Storage — persists to Amazon S3 (follows S3SessionManager patterns), content type preserved via S3 object metadata

Built-in retrieval tool (opt-in, disabled by default):

  • retrieve_offloaded_content — agent can fetch offloaded content by reference
  • Enabled via include_retrieval_tool=True
  • Returns content in its native type: text as string, JSON as {"json": ...} block, images as {"image": ...} block, documents as {"document": ...} block
  • Retrieval results are excluded from re-offloading (prevents circular offload loops)
  • Disabled by default because once storage defaults to VFS/Sandbox, agents will use shell/grep/SQL tools to navigate offloaded results directly
  • Inline guidance in each offloaded result adapts: when tool is enabled, mentions retrieve_offloaded_content; always tells the agent to use available tools

Usage:

from strands import Agent
from strands.vended_plugins.context_offloader import (
    ContextOffloader,
    InMemoryStorage,
)

# In-memory storage — context reduction only, no persistence
agent = Agent(plugins=[
    ContextOffloader(storage=InMemoryStorage())
])
from strands.vended_plugins.context_offloader import (
    ContextOffloader,
    FileStorage,
)

# File storage — persists artifacts to disk, custom thresholds
agent = Agent(plugins=[
    ContextOffloader(
        storage=FileStorage("./artifacts"),
        max_result_tokens=5_000,
        preview_tokens=2_000,
    )
])
from strands.vended_plugins.context_offloader import (
    ContextOffloader,
    S3Storage,
)

# S3 storage with retrieval tool enabled
agent = Agent(plugins=[
    ContextOffloader(
        storage=S3Storage(
            bucket="my-agent-artifacts",
            prefix="tool-results/",
        ),
        include_retrieval_tool=True,
    )
])

Related Issues

Closes #1296

Documentation PR

strands-agents/docs#772

Type of Change

New feature

Testing

How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli

  • I ran hatch run prepare
  • 66 new unit tests covering all storage backends (including metadata sidecar persistence and corruption), all content types, retrieval tool opt-in/opt-out with native type returns, token-based thresholds with mocked count_tokens, inline guidance adaptation, edge cases (threshold boundaries, cancelled tools, storage failures, partial failures, path traversal, thread safety, input validation, circular offload prevention)
  • Manually tested via demo script confirming: token-based offloading triggers correctly, agent uses preview for summaries, retrieves only when specific data is needed, retrieval returns native content types
  • All existing tests pass (1710 passed, 10 pre-existing telemetry failures unrelated to this change)

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@lizradway lizradway temporarily deployed to manual-approval April 20, 2026 15:48 — with GitHub Actions Inactive
@lizradway lizradway added the needs-api-review Makes changes to the public API surface label Apr 20, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 20, 2026

Codecov Report

❌ Patch coverage is 95.43726% with 12 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...strands/vended_plugins/context_offloader/plugin.py 94.44% 3 Missing and 5 partials ⚠️
...trands/vended_plugins/context_offloader/storage.py 96.55% 3 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

Comment thread src/strands/vended_plugins/result_externalizer/plugin.py Outdated
Comment thread src/strands/vended_plugins/result_externalizer/storage.py Outdated
Comment thread src/strands/vended_plugins/result_externalizer/plugin.py Outdated
@github-actions

This comment was marked as outdated.

Comment thread src/strands/vended_plugins/result_externalizer/plugin.py Outdated
@github-actions

This comment was marked as outdated.

Comment thread src/strands/vended_plugins/context_offloader/storage.py
Comment thread src/strands/vended_plugins/result_externalizer/storage.py Outdated
Comment thread src/strands/vended_plugins/result_externalizer/storage.py Outdated
@github-actions
Copy link
Copy Markdown

Assessment: Comment

Well-designed plugin that solves a real problem — proactive tool result externalization is a clear improvement over reactive truncation. The Plugin/Protocol/storage architecture is clean and follows existing SDK patterns (AgentSkills for plugin design, S3SessionManager for S3 client setup). Test coverage is thorough at 47 tests across all backends and content types.

Review Categories
  • API Review: This PR introduces 5 new public types (Plugin + Protocol + 3 storage backends). Per the API Bar Raising process, it should carry the needs-api-review label before merge. The API surface is well-documented in the PR description with use cases and code examples.
  • Input Validation: Constructor parameters max_result_chars and preview_chars lack validation — negative values or preview_chars >= max_result_chars would cause silent misbehavior.
  • Code Duplication: _image_placeholder is duplicated from SlidingWindowConversationManager — extracting to a shared utility would align with the composability tenet.
  • Storage Lifecycle: The ExternalizationStorage protocol has no cleanup/deletion mechanism, which could lead to unbounded growth in the in-memory backend for long-running agents.
  • Documentation Gap: AGENTS.md directory structure needs updating per repository guidelines.

Solid implementation overall — the architecture is well thought out and the test suite is comprehensive. 🙏

Comment thread AGENTS.md Outdated
Comment thread src/strands/vended_plugins/result_externalizer/plugin.py Outdated
Comment thread src/strands/vended_plugins/result_externalizer/storage.py Outdated
Comment thread src/strands/vended_plugins/result_externalizer/plugin.py Outdated
Comment thread src/strands/vended_plugins/result_externalizer/__init__.py Outdated
opieter-aws
opieter-aws previously approved these changes Apr 23, 2026
Comment thread src/strands/vended_plugins/context_offloader/plugin.py Outdated
Comment thread src/strands/vended_plugins/context_offloader/plugin.py
Comment thread src/strands/vended_plugins/context_offloader/plugin.py Outdated
Comment thread src/strands/vended_plugins/context_offloader/plugin.py Outdated
Comment thread src/strands/vended_plugins/context_offloader/storage.py Outdated
Comment thread src/strands/vended_plugins/context_offloader/plugin.py
@github-actions
Copy link
Copy Markdown

Assessment: Comment

The move to token-based thresholds is excellent — it directly addresses mkmeral's earlier feedback and is a significant improvement over character heuristics. The integration with model.count_tokens() is clean, and the tiktoken-based preview slicing is a nice touch. The async hook transition is handled correctly with proper test mocking.

Review Items
  • Private API coupling: _get_encoding is imported from strands.models.model — a private function. Consider inlining the tiktoken logic (4 lines) to avoid fragile cross-module coupling to an internal API.
  • Unrelated test changes: test_model.py adds tests for _ModelPlugin, context_window_limit, and stateful that appear orthogonal to context offloading. These should be documented or split out.

The token-based approach is the right design. The private API import is the main item to resolve before merge.

@github-actions
Copy link
Copy Markdown

Assessment: Approve

No new changes since the previous review. The one open thread (_get_encoding private API import at line 41) still applies — resolving that by inlining the tiktoken logic would remove the only remaining fragile coupling. Everything else is merge-ready.

opieter-aws
opieter-aws previously approved these changes Apr 23, 2026
@opieter-aws
Copy link
Copy Markdown
Contributor

nit: Should the PR title reference offloading for documentation purposes?

Copy link
Copy Markdown
Contributor

@mkmeral mkmeral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_encoding comment is the main one. otherwise it looks good to me (if review agent also agrees :p )

Comment thread src/strands/vended_plugins/context_offloader/plugin.py
Comment thread src/strands/vended_plugins/context_offloader/plugin.py
Comment thread src/strands/models/model.py Outdated
Comment thread src/strands/vended_plugins/context_offloader/plugin.py
@github-actions
Copy link
Copy Markdown

Assessment: Approve

The opt-in retrieval tool is a good design evolution — making it disabled by default with include_retrieval_tool=True is forward-compatible with VFS/Sandbox while keeping the escape hatch available. The init_agent() approach to strip the auto-discovered tool is clean. Inline guidance adapting based on tool availability is a nice UX touch. System prompt injection removal simplifies the plugin's lifecycle footprint.

No new issues found. This PR is ready to merge.

Comment thread src/strands/vended_plugins/context_offloader/storage.py
Comment thread src/strands/vended_plugins/context_offloader/plugin.py
Comment thread src/strands/vended_plugins/context_offloader/plugin.py Outdated
@github-actions
Copy link
Copy Markdown

Assessment: Approve

The self-skip guard refinement is a nice improvement — gating on self._include_retrieval_tool avoids unnecessary string comparison when the tool is disabled, and using self.retrieve_offloaded_content.tool_name instead of a hardcoded "retrieve_offloaded_content" string eliminates a magic constant. Both the enabled-skip and disabled-no-skip test paths are covered.

No new issues. This PR is ready to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-api-review Makes changes to the public API surface size/xl

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Large Tool Result Externalization via AfterToolCallEvent Hook

4 participants