Skip to content

feat: support image attachments on user messages#572

Open
ragini-pandey wants to merge 2 commits into
Nano-Collective:mainfrom
ragini-pandey:feature/image-attachments
Open

feat: support image attachments on user messages#572
ragini-pandey wants to merge 2 commits into
Nano-Collective:mainfrom
ragini-pandey:feature/image-attachments

Conversation

@ragini-pandey

@ragini-pandey ragini-pandey commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Description

Adds multimodal image input to user messages across all chat surfaces (interactive TUI, VS Code prompt path, and ACP). Images are carried as base64 on the internal Message type and converted to AI SDK image parts (a data: URL) at the provider boundary, which Anthropic, Google, and OpenAI-compatible providers all accept.

Highlights:

  • New ImageAttachment type threaded through the submit chain (useChatHandler, useAppHandlers, app-util, message-builder, chat-input/user-input).
  • Clipboard paste via Ctrl+V (macOS osascript, Linux wl-paste/xclip, Windows PowerShell).
  • Image file paths typed/pasted/dragged into the terminal are resolved to attachments and stripped from the message text.
  • ACP image content blocks collected as attachments; unsupported media types and audio are noted rather than silently dropped.
  • modelSupportsVision() heuristic drives a non-blocking warning when the active model likely can't see images (the image is still sent).
  • UserInput lists pending attachments (Ctrl+X removes the last); UserMessage shows an attached-image count.

Note: the ACP and message-converter portions existed in a partially-applied, non-compiling state on the base; this PR completes them (adds the missing ImageAttachment type they imported and restores a dropped return in the user branch of convertToModelMessages).

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation update

Testing

Automated Tests

  • New features include passing tests in .spec.ts/tsx files
  • All existing tests pass (pnpm test:all completes successfully)
  • Tests cover both success and error scenarios

pnpm test:types and pnpm test:lint pass, and all new/updated specs pass (clipboard-image, vision-support, message-converter, acp-content). pnpm test:all currently fails on 3 pre-existing tests in source/utils/tool-result-display.spec.tsx that also fail on main and are unrelated to this change.

Manual Testing

  • Tested with Ollama
  • Tested with OpenRouter
  • Tested with OpenAI-compatible API
  • Tested MCP integration (if applicable)

Checklist

  • Code follows project style guidelines
  • Self-review completed
  • Documentation updated (if needed)
  • No breaking changes (or clearly documented)
  • Appropriate logging added using structured logging (see CONTRIBUTING.md)
Screen.Recording.2026-06-14.at.6.40.47.PM.mov

Add multimodal image input across the chat surfaces:

- New ImageAttachment type carried on user messages, converted to AI SDK
  image parts at the provider boundary (data URL accepted by Anthropic,
  Google, and OpenAI-compatible providers).
- Clipboard paste (Ctrl+V) and drag/typed image file paths become
  attachments; image path tokens are stripped from the message text.
- ACP image content blocks are collected as attachments; unsupported
  media types and audio are noted rather than silently dropped.
- modelSupportsVision() heuristic drives a non-blocking warning when the
  active model likely cannot see images (the image is still sent).
- UserInput shows pending attachments with Ctrl+X to remove the last;
  UserMessage shows an attached-image count.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread source/utils/clipboard-image.spec.ts Fixed
Resolves the CodeQL "incomplete string escaping" alert: the test built
its escaped path by replacing spaces only, leaving any pre-existing
backslash unescaped. Escape backslashes first, then spaces, so the
encoding is complete for any path. Behavior is unchanged for the
temp paths under test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@will-lamerton

Copy link
Copy Markdown
Member

Hey @ragini-pandey - this is a brilliant PR. Thank you for building this. It works well and below are mostly some quality of life comments :)

  1. The picture emoji with "1 image attached" - if we could remove that, it would be great as we don't tend to use emojis in the design palette of Nanocoder. Something like this could be okay: ■ :)

  2. On my Macbook upon dragging in a screenshot, it doesn't wrap the path with quote marks and therefore it doesn't recognise that there is an image attached. Upon dragging, it needs to either wrap the path with quote marks automatically or work without needing quote marks.

  3. I don't think this warning is needed - the model should report or error if it can't support images. Although I like this warning, it shows every time I attach an image which is overkill.

  4. readClipboardImage in source/utils/clipboard-image.ts returns null silently when the underlying tooling isn't installed - osascript on macOS, wl-paste / xclip on Linux, or PowerShell on Windows. On a minimal Linux container without wl-paste or xclip (which is super common in dev containers and CI boxes) the user gets no feedback that Ctrl+V is a no-op. Could we log a one-liner at debug level naming the missing command, or at least surface a small note in the status bar / footer so it's not a black box?

  5. source/utils/clipboard-image.ts has a 10 MB maxBuffer on the spawnSync calls (around line 160 with MAX_IMAGE_BYTES). If a pasted screenshot exceeds that, the child gets killed and readClipboardImage returns null with only a logger.warn - the user sees nothing. For the typical "drag a screenshot in" flow this is borderline in scope, but a soft "image too large" hint in the input footer would make the failure mode discoverable. Worth a follow-up if not for this PR.

  6. extractImageReferences in source/utils/clipboard-image.ts (around line 93) will happily match https://example.com/chart.png style URLs and call existsSync on them. It's harmless (the existsSync returns false and the token is left in place), but it's a stat per URL-like token and it slightly leaks FS access into message parsing. Not a blocker - more of a tidy-up for a future pass.

Love this though. Cannot wait to merge! Let me know if there are any other questions or thoughts!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants