feat(bedrock-model): Support content and text DocumentSource#2135
feat(bedrock-model): Support content and text DocumentSource#2135jonathankeys wants to merge 1 commit intostrands-agents:mainfrom
Conversation
Bedrock ConverseStream requests allow message contents to include the document media which is missing fields for text and content DocumentSource's. This feature enables the passing of these so features such as citations can be enabled for passed in text and content blocks. changes: - Add handling for the content and text DocumentSource fields - Add media object for DocumentBlockContent
There was a problem hiding this comment.
Pull request overview
Adds support for Bedrock DocumentSource variants (text and inline content) so Bedrock ConverseStream requests can include document media in more forms (enabling features like citations), and updates Bedrock request formatting/tests accordingly.
Changes:
- Extend
DocumentSourcetyping to includetextandcontentplus a newDocumentBlockContenttype. - Update Bedrock request formatting to pass through
DocumentSource.text/DocumentSource.contentand clarify source variable naming for mypy. - Add/extend unit tests covering the new
DocumentSourcevariants and Bedrock request formatting behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
src/strands/types/media.py |
Adds DocumentBlockContent and extends DocumentSource to support text and content. |
src/strands/models/bedrock.py |
Formats new document source variants and renames local variables for clearer typing. |
tests/strands/types/test_media.py |
Adds tests for DocumentSource.text/content and DocumentBlockContent. |
tests/strands/models/test_bedrock.py |
Adds tests ensuring Bedrock request formatting preserves text and filters content blocks. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| video_source = video["source"] | ||
| formatted_video_source: dict[str, Any] | None | ||
| if "location" in source: | ||
| formatted_video_source = self._handle_location(source["location"]) | ||
| if "location" in video_source: | ||
| formatted_video_source = self._handle_location(video_source["location"]) | ||
| if formatted_video_source is None: | ||
| return None | ||
| elif "bytes" in source: | ||
| formatted_video_source = {"bytes": source["bytes"]} | ||
| elif "bytes" in video_source: | ||
| formatted_video_source = {"bytes": video_source["bytes"]} | ||
| result = {"format": video["format"], "source": formatted_video_source} | ||
| return {"video": result} |
There was a problem hiding this comment.
formatted_video_source is only assigned in the location/bytes branches. If a video block includes a source without either key, this will raise UnboundLocalError when building result. Initialize formatted_video_source to None and either return None (skip block) or raise a TypeError when neither supported key is present.
There was a problem hiding this comment.
This leaves current behavior to keep change set smaller, this area was only updated to address mypy linter concerns. If needed, I can update this area, but will increase the logical scope of the changes.
| elif "content" in document_source: | ||
| formatted_document_source = { | ||
| "content": [{"text": item["text"]} for item in document_source["content"] if "text" in item] | ||
| } |
There was a problem hiding this comment.
When document["source"] is present but does not contain any of the recognized keys (location, bytes, text, content), formatted_document_source remains None and the request is sent with "source": None. This will violate Bedrock’s schema (source should be an object) and can lead to a runtime validation error. Consider either (a) returning None to drop the whole document block (consistent with the unsupported-location path), or (b) only setting result["source"] when formatted_document_source is not None and otherwise raising a TypeError/logging + skipping.
| } | |
| } | |
| if formatted_document_source is None: | |
| return None |
There was a problem hiding this comment.
This is a good point to raise, the source field is always required in a DocumentSource (per docs), and when source is passed as None a botocore exception is raised:
botocore.exceptions.ParamValidationError: Parameter validation failed:
Missing required parameter in messages[0].content[1].document: "source"
This is the current behavior which I was not modifying, so based on any review provided, I can follow the location handling of dropping, raising an exception in Strands, or allowing botocore to handle it. I personally think leaving the current implementation is best, allowing botocore/aws to bubble up the correct exception, not dealing with that in Strands itself.
| formatted_document_source = { | ||
| "content": [{"text": item["text"]} for item in document_source["content"] if "text" in item] | ||
| } |
There was a problem hiding this comment.
For DocumentSource.content, this always emits a content array even if every item is filtered out by the if "text" in item guard, resulting in {"content": []}. If Bedrock rejects empty content (similar filtering elsewhere in this file omits empty arrays), this can cause request validation failures. Consider omitting the content key when the filtered list is empty, or returning None to skip the document block when no valid content blocks remain.
| formatted_document_source = { | |
| "content": [{"text": item["text"]} for item in document_source["content"] if "text" in item] | |
| } | |
| formatted_content = [{"text": item["text"]} for item in document_source["content"] if "text" in item] | |
| if not formatted_content: | |
| return None | |
| formatted_document_source = {"content": formatted_content} |
There was a problem hiding this comment.
When an empty content list is passed in, botocore throws an exception:
botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the ConverseStream operation: The model returned the following errors: messages.0.content.1.document.source.content.content: Field required
I see value in not passing content (returning None) when there is actually no content.
Not sure what the use case of a user providing an empty source document is, so it can be filtered out. I think if this path is chosen then the same approach should be used for the document source being None (dropping, since there is no content). I don't think a user would want this to error out, but also would be good to know that malformed documents are being provided and not being silently hidden.
| image_source = image["source"] | ||
| formatted_image_source: dict[str, Any] | None | ||
| if "location" in source: | ||
| formatted_image_source = self._handle_location(source["location"]) | ||
| if "location" in image_source: | ||
| formatted_image_source = self._handle_location(image_source["location"]) | ||
| if formatted_image_source is None: | ||
| return None | ||
| elif "bytes" in source: | ||
| formatted_image_source = {"bytes": source["bytes"]} | ||
| elif "bytes" in image_source: | ||
| formatted_image_source = {"bytes": image_source["bytes"]} | ||
| result = {"format": image["format"], "source": formatted_image_source} | ||
| return {"image": result} |
There was a problem hiding this comment.
formatted_image_source is only assigned in the location/bytes branches. If an image block includes a source without either key (e.g., empty dict), this will raise UnboundLocalError when building result. Initialize formatted_image_source to None and either return None (skip block) or raise a TypeError when neither supported key is present.
There was a problem hiding this comment.
This leaves current behavior to keep change set smaller, this area was only updated to address mypy linter concerns. If needed, I can update this area, but will increase the logical scope of the changes.
Description
Bedrock ConverseStream requests allow message Content to include the DocumentBlock which has the fields
textandcontentin DocumentSource which is missing in theBedrockModelmessage formatting. This change enables the passing of these fields so features (e.g. citations) can utilize these document types.note: mypy was raising linter errors due to
sourcevariable in_format_request_message_content(...)being used for different object types, so those variable names were updated to satisfy mypy and add clarity for the type of source in the block.changes:
Other Notes
Testing done using model Claude Sonnet 4.6
When
contentortextsources are passed in,citationsmust also be enabled in thedocumentfield. This is not documented anywhere from what I have seen and unsure if this is coming from client side botocore validation or from AWS itself.e.g.
when
citationsis disabled or not passed in at all, botocore raises a Validation exceptionHowever, when
citationsis enabled, it works correctly.Related Issues
#1066
Documentation PR
N/A
Type of Change
New feature
Testing
How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli
hatch run prepareChecklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.