Skip to content

feat: support PDF documents in tools, get(), and file reads#289

Open
mlikasam-askui wants to merge 1 commit into
mainfrom
feat/pdf-document-support
Open

feat: support PDF documents in tools, get(), and file reads#289
mlikasam-askui wants to merge 1 commit into
mainfrom
feat/pdf-document-support

Conversation

@mlikasam-askui

Copy link
Copy Markdown
Contributor

Summary

Adds first-class PDF support across the SDK, mirroring how images are already
handled. PDFs are sent to the model as a document content block — an
Anthropic base64 document block (no beta header) and an OpenAI file
content part.

Changes

  • Message types: new DocumentBlockParam + Base64PdfSourceParam; allowed
    in ToolResultBlockParam content and exported from askui / askui.models.
  • Tool results: a tool that returns a PdfSource is rendered as a document
    block (_convert_to_content). MCP tools returning a PDF as an embedded blob
    resource (application/pdf) are now converted too, instead of being dropped.
  • get(): the Anthropic and OpenAI get models accept PdfSource (Office
    documents remain unsupported).
  • File reads: AgentOs.get_file detects and returns PDFs as PdfSource;
    file-type detection now sniffs the MIME via filetype.guess and decodes the
    base64 payload once.
  • LoadPdfTool: new universal tool to load a PDF from disk and hand it to
    the model (the PDF counterpart to LoadImageTool).
  • Reporting: base64 PDF (and any media) blobs are truncated in reports to
    keep them readable.
  • Chore: exclude venv from the mypy typecheck.

Let tools and file reads return PDFs to the model as a document content
block, mirroring existing image handling. PDFs are sent as an Anthropic
base64 `document` block (no beta header) and an OpenAI `file` content part.

- agent_message_param: add DocumentBlockParam + Base64PdfSourceParam; allow
  document blocks in ToolResultBlockParam content
- tools: render a returned PdfSource as a document block in tool results
- anthropic/openai get models: accept PdfSource (Office docs stay unsupported)
- LoadPdfTool: load a PDF from disk and hand it to the model
- AgentOs.get_file: detect and return PDFs as PdfSource; sniff file type via
  filetype.guess and decode the base64 payload once
- reporting: truncate base64 PDF (and any media) blobs to keep reports readable
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant