Skip to content

feat(bedrock-model): Support content and text DocumentSource#2135

Open
jonathankeys wants to merge 1 commit intostrands-agents:mainfrom
jonathankeys:bedrock/message-content
Open

feat(bedrock-model): Support content and text DocumentSource#2135
jonathankeys wants to merge 1 commit intostrands-agents:mainfrom
jonathankeys:bedrock/message-content

Conversation

@jonathankeys
Copy link
Copy Markdown

@jonathankeys jonathankeys commented Apr 16, 2026

Description

Bedrock ConverseStream requests allow message Content to include the DocumentBlock which has the fields text and content in DocumentSource which is missing in the BedrockModel message formatting. This change enables the passing of these fields so features (e.g. citations) can utilize these document types.

note: mypy was raising linter errors due to source variable in _format_request_message_content(...) being used for different object types, so those variable names were updated to satisfy mypy and add clarity for the type of source in the block.

changes:

  • Add handling for the content and text DocumentSource fields
  • Add media object for DocumentBlockContent

Other Notes

Testing done using model Claude Sonnet 4.6

When content or text sources are passed in, citations must also be enabled in the document field. This is not documented anywhere from what I have seen and unsure if this is coming from client side botocore validation or from AWS itself.

e.g.

{
    "document": {
        "format": "txt",
        "name": "document",
        "source": {
            "content": [{ "text": "Paris is the capital of France." }, { "text": "The capital of Germany is Berlin." }]
        },
        "citations": { "enabled": True}
    },
}

when citations is disabled or not passed in at all, botocore raises a Validation exception

botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the ConverseStream operation: DocumentSource object at messages.0.content.1.document.source must set one of the following keys: bytes, s3Location.

However, when citations is enabled, it works correctly.

messages: list[Message] = [
    {
        "role": "user",
        "content": [
            {
                "text": "What is the capital of France?"
            },
            {
                "document": {
                    "format": "txt",
                    "name": "document",
                    "source": {
                        "content": [{ "text": "Paris is the capital of France." }, { "text": "The capital of Germany is Berlin." }]
                    },
                    "citations": { "enabled": True}
                },
            }
        ]
    }
]
{
  "citation": {
    "title": "document",
    "sourceContent": [
      {
        "text": "Paris is the capital of France."
      }
    ],
    "location": {
      "documentChunk": {
        "documentIndex": 0,
        "start": 0,
        "end": 1
      }
    }
  },
  "delta": {
    "citation": {
      "title": "document",
      "sourceContent": [
        {
          "text": "Paris is the capital of France."
        }
      ],
      "location": {
        "documentChunk": {
          "documentIndex": 0,
          "start": 0,
          "end": 1
        }
      }
    }
  },
  "agent": "<strands.agent.agent.Agent object>",
  "event_loop_cycle_id": "024ff76b-cd42-4f78-bf69-d9929c3cf101",
  "request_state": {},
  "event_loop_cycle_trace": "<strands.telemetry.metrics.Trace object>",
  "event_loop_cycle_span": "NonRecordingSpan(trace_id=0x0, span_id=0x0)"
}

Related Issues

#1066

Documentation PR

N/A

Type of Change

New feature

Testing

How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli

  • I ran hatch run prepare

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Bedrock ConverseStream requests allow message contents to include the document media which is missing fields for text and content DocumentSource's. This feature enables the passing of these so features such as citations can be enabled for passed in text and content blocks.

changes:
 - Add handling for the content and text DocumentSource fields
 - Add media object for DocumentBlockContent
Copilot AI review requested due to automatic review settings April 16, 2026 02:33
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for Bedrock DocumentSource variants (text and inline content) so Bedrock ConverseStream requests can include document media in more forms (enabling features like citations), and updates Bedrock request formatting/tests accordingly.

Changes:

  • Extend DocumentSource typing to include text and content plus a new DocumentBlockContent type.
  • Update Bedrock request formatting to pass through DocumentSource.text/DocumentSource.content and clarify source variable naming for mypy.
  • Add/extend unit tests covering the new DocumentSource variants and Bedrock request formatting behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
src/strands/types/media.py Adds DocumentBlockContent and extends DocumentSource to support text and content.
src/strands/models/bedrock.py Formats new document source variants and renames local variables for clearer typing.
tests/strands/types/test_media.py Adds tests for DocumentSource.text/content and DocumentBlockContent.
tests/strands/models/test_bedrock.py Adds tests ensuring Bedrock request formatting preserves text and filters content blocks.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +645 to 654
video_source = video["source"]
formatted_video_source: dict[str, Any] | None
if "location" in source:
formatted_video_source = self._handle_location(source["location"])
if "location" in video_source:
formatted_video_source = self._handle_location(video_source["location"])
if formatted_video_source is None:
return None
elif "bytes" in source:
formatted_video_source = {"bytes": source["bytes"]}
elif "bytes" in video_source:
formatted_video_source = {"bytes": video_source["bytes"]}
result = {"format": video["format"], "source": formatted_video_source}
return {"video": result}
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatted_video_source is only assigned in the location/bytes branches. If a video block includes a source without either key, this will raise UnboundLocalError when building result. Initialize formatted_video_source to None and either return None (skip block) or raise a TypeError when neither supported key is present.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This leaves current behavior to keep change set smaller, this area was only updated to address mypy linter concerns. If needed, I can update this area, but will increase the logical scope of the changes.

elif "content" in document_source:
formatted_document_source = {
"content": [{"text": item["text"]} for item in document_source["content"] if "text" in item]
}
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When document["source"] is present but does not contain any of the recognized keys (location, bytes, text, content), formatted_document_source remains None and the request is sent with "source": None. This will violate Bedrock’s schema (source should be an object) and can lead to a runtime validation error. Consider either (a) returning None to drop the whole document block (consistent with the unsupported-location path), or (b) only setting result["source"] when formatted_document_source is not None and otherwise raising a TypeError/logging + skipping.

Suggested change
}
}
if formatted_document_source is None:
return None

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point to raise, the source field is always required in a DocumentSource (per docs), and when source is passed as None a botocore exception is raised:

botocore.exceptions.ParamValidationError: Parameter validation failed:
Missing required parameter in messages[0].content[1].document: "source"

This is the current behavior which I was not modifying, so based on any review provided, I can follow the location handling of dropping, raising an exception in Strands, or allowing botocore to handle it. I personally think leaving the current implementation is best, allowing botocore/aws to bubble up the correct exception, not dealing with that in Strands itself.

Comment on lines +550 to +552
formatted_document_source = {
"content": [{"text": item["text"]} for item in document_source["content"] if "text" in item]
}
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For DocumentSource.content, this always emits a content array even if every item is filtered out by the if "text" in item guard, resulting in {"content": []}. If Bedrock rejects empty content (similar filtering elsewhere in this file omits empty arrays), this can cause request validation failures. Consider omitting the content key when the filtered list is empty, or returning None to skip the document block when no valid content blocks remain.

Suggested change
formatted_document_source = {
"content": [{"text": item["text"]} for item in document_source["content"] if "text" in item]
}
formatted_content = [{"text": item["text"]} for item in document_source["content"] if "text" in item]
if not formatted_content:
return None
formatted_document_source = {"content": formatted_content}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When an empty content list is passed in, botocore throws an exception:

botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the ConverseStream operation: The model returned the following errors: messages.0.content.1.document.source.content.content: Field required

I see value in not passing content (returning None) when there is actually no content.

Not sure what the use case of a user providing an empty source document is, so it can be filtered out. I think if this path is chosen then the same approach should be used for the document source being None (dropping, since there is no content). I don't think a user would want this to error out, but also would be good to know that malformed documents are being provided and not being silently hidden.

Comment on lines +573 to 582
image_source = image["source"]
formatted_image_source: dict[str, Any] | None
if "location" in source:
formatted_image_source = self._handle_location(source["location"])
if "location" in image_source:
formatted_image_source = self._handle_location(image_source["location"])
if formatted_image_source is None:
return None
elif "bytes" in source:
formatted_image_source = {"bytes": source["bytes"]}
elif "bytes" in image_source:
formatted_image_source = {"bytes": image_source["bytes"]}
result = {"format": image["format"], "source": formatted_image_source}
return {"image": result}
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatted_image_source is only assigned in the location/bytes branches. If an image block includes a source without either key (e.g., empty dict), this will raise UnboundLocalError when building result. Initialize formatted_image_source to None and either return None (skip block) or raise a TypeError when neither supported key is present.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This leaves current behavior to keep change set smaller, this area was only updated to address mypy linter concerns. If needed, I can update this area, but will increase the logical scope of the changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants