feat: add CUA computer-use sandbox support#7828
feat: add CUA computer-use sandbox support#7828zouyonghe wants to merge 25 commits intoAstrBotDevs:masterfrom
Conversation
|
Related Documentation 1 document(s) may need updating based on files changed in this PR: AstrBotTeam's Space pr4697的改动View Suggested Changes@@ -109,7 +109,9 @@
**运行时工具解析**
-新增 `_get_runtime_computer_tools()` 方法,根据 `provider_settings.computer_use_runtime` 配置动态解析运行时工具。
+新增 `_get_runtime_computer_tools()` 方法,根据 `provider_settings.computer_use_runtime` 配置动态解析运行时工具。该方法现在接受 `booter` 参数,用于确定当运行时为 `"sandbox"` 时需要包含哪些工具集。
+
+当 `runtime="sandbox"` 且 `booter="cua"` 时,除了标准的沙盒工具外,该方法还会额外注册 CUA 特有的 GUI 工具(`CuaScreenshotTool`、`CuaMouseClickTool`、`CuaKeyboardTypeTool`),确保这些工具在 SubAgent handoff 场景下也可正常使用。
**Booter 选择优化(PR #6064)**
@@ -117,7 +119,7 @@
- 如果 `computer_use_runtime` 设置为 `"local"`,立即返回本地 booter,无需检查 sandbox booter 配置
- 如果 `computer_use_runtime` 设置为 `"none"`,抛出 `RuntimeError` 提示沙箱运行时已禁用
-- 只有在以上条件均不满足时,才继续检查 sandbox booter 配置(如 `shipyard_neo` 等)
+- 只有在以上条件均不满足时,才继续检查 sandbox booter 配置(如 `shipyard_neo`、`cua` 等)
该优化避免了在运行时类型已明确时进行不必要的沙箱配置检查,提高了边缘场景的性能。
@@ -133,6 +135,11 @@
- `FILE_WRITE_TOOL`:写入 UTF-8 文本文件
- `FILE_EDIT_TOOL`:通过精确字符串替换编辑文件
- `GREP_TOOL`:使用 ripgrep 搜索文件内容
+
+ 当 `provider_settings.sandbox.booter` 设置为 `"cua"` 时,sandbox 运行时还包含以下 CUA 特有的 GUI 交互工具:
+ - `CuaScreenshotTool` (`astrbot_cua_screenshot`):捕获 CUA 沙盒桌面的屏幕截图,可选择是否将图片发送给用户
+ - `CuaMouseClickTool` (`astrbot_cua_mouse_click`):在 CUA 沙盒桌面的指定坐标执行鼠标点击操作(支持 left、right、middle 按钮)
+ - `CuaKeyboardTypeTool` (`astrbot_cua_keyboard_type`):在 CUA 沙盒桌面中键入文本内容
- **`runtime="local"`** 或 **`runtime="local_sandboxed"`** 提供以下工具:
- `LOCAL_EXECUTE_SHELL_TOOL`:在本地环境中执行 Shell 命令
@@ -179,6 +186,35 @@
- `shipyard_neo_access_token`:Bay 的 API 密钥(`sk-bay-...`),留空时自动从 `credentials.json` 发现
- `shipyard_neo_profile`:沙箱 profile 名称,如 `python-default`(留空时自动选择能力最多的 profile)
- `shipyard_neo_ttl`:沙箱生存时间(秒),默认 3600
+
+**CUA 沙箱支持(PR #7828)**
+
+[PR #7828](https://github.com/AstrBotDevs/AstrBot/pull/7828) 新增了 `cua` 沙箱运行时选项,基于 CUA SDK 提供 Computer-Use Automation 能力,支持 GUI 交互和桌面自动化:
+
+- **Booter 类型**:`provider_settings.sandbox.booter` 现支持 `"cua"` 选项
+- **能力支持**:python、shell、filesystem、gui(screenshot、mouse、keyboard)
+- **可选依赖**:需要在 AstrBot 环境中安装 `cua` 包(`pip install cua`),该包为可选运行时依赖
+- **遥测默认禁用**:`cua_telemetry_enabled` 配置项默认为 `False`,用户可选择是否启用 CUA SDK 的遥测功能
+
+**配置项(provider_settings.sandbox)**:
+
+- `cua_image`:CUA 沙箱镜像/系统类型,默认 `"linux"`(可填写 linux、macos、windows、android,具体取决于 CUA SDK 支持)
+- `cua_os_type`:CUA 沙箱操作系统类型,默认 `"linux"`(可选:linux、macos、windows、android)
+- `cua_ttl`:CUA 沙箱生存时间(秒),默认 3600(作为会话配置保存,具体生效取决于 CUA SDK)
+- `cua_telemetry_enabled`:是否允许 CUA SDK 发送遥测数据,默认 `False`
+
+**CUA GUI 工具使用方式**:
+
+主代理在启用 CUA booter 时,系统提示词中会自动注入以下指引:
+
+```
+[CUA Desktop Control]
+Use `astrbot_cua_screenshot` to inspect the current desktop before
+clicking or typing. Use coordinates from screenshots for
+`astrbot_cua_mouse_click`, then `astrbot_cua_keyboard_type` for text input.
+```
+
+该指引引导 LLM 先使用截图工具查看桌面状态,再根据截图中的坐标执行鼠标点击和键盘输入操作,确保 GUI 自动化的准确性。
**本地沙箱增强模式(local_sandboxed)**
|
There was a problem hiding this comment.
Hey - I've found 4 issues, and left some high level feedback:
- In
_screenshot_to_bytes,isinstance(raw, bytes | bytearray)will raise at runtime becauseisinstancedoes not accept PEP 604 union types; switch this toisinstance(raw, (bytes, bytearray))(or similar tuple form) to avoid aTypeError. - Consider normalizing the shape of filesystem responses in
CuaFileSystemComponent.list_dir(string vs. list of entries depending on whether a native filesystem is present) so downstream consumers see a consistent type forentriesregardless of the backend.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `_screenshot_to_bytes`, `isinstance(raw, bytes | bytearray)` will raise at runtime because `isinstance` does not accept PEP 604 union types; switch this to `isinstance(raw, (bytes, bytearray))` (or similar tuple form) to avoid a `TypeError`.
- Consider normalizing the shape of filesystem responses in `CuaFileSystemComponent.list_dir` (string vs. list of entries depending on whether a native filesystem is present) so downstream consumers see a consistent type for `entries` regardless of the backend.
## Individual Comments
### Comment 1
<location path="astrbot/core/computer/booters/cua.py" line_range="321-322" />
<code_context>
+ return {"success": bool(payload.get("success", True)), **payload}
+
+
+def _screenshot_to_bytes(raw: Any) -> bytes:
+ if isinstance(raw, bytes | bytearray):
+ return bytes(raw)
+ if isinstance(raw, str):
</code_context>
<issue_to_address>
**issue (bug_risk):** Fix `isinstance` usage with union types in `_screenshot_to_bytes`.
`isinstance(raw, bytes | bytearray)` will raise a `TypeError` at runtime because `isinstance` doesn’t accept PEP 604 unions. This will break when a bytes-like value is returned. Please change it to `if isinstance(raw, (bytes, bytearray)):` so it works correctly.
</issue_to_address>
### Comment 2
<location path="astrbot/core/computer/booters/cua.py" line_range="413-416" />
<code_context>
+ kwargs["telemetry_enabled"] = self.telemetry_enabled
+ return kwargs
+
+ async def shutdown(self) -> None:
+ if self._sandbox_cm is not None:
+ await self._sandbox_cm.__aexit__(None, None, None)
+ self._sandbox_cm = None
+ self._sandbox = None
+
</code_context>
<issue_to_address>
**issue (bug_risk):** Reset cached components on shutdown to avoid using a closed sandbox.
Currently `shutdown` only clears `_sandbox_cm` and `_sandbox`, leaving `_shell`, `_python`, `_fs`, and `_gui` non-None. This allows callers to keep using these properties against a closed sandbox, which can lead to subtle failures. Please also clear these component fields and/or have `available()` reflect post-shutdown unusability (e.g., by checking `_sandbox_cm` or the components).
</issue_to_address>
### Comment 3
<location path="astrbot/core/astr_agent_tool_exec.py" line_range="258-261" />
<code_context>
runtime_computer_tools = cls._get_runtime_computer_tools(
runtime,
tool_mgr,
+ str(provider_settings.get("sandbox", {}).get("booter", "")),
)
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Avoid coercing a potentially `None` booter to the string "None".
Here `booter` is wrapped in `str(...)`, so an explicit `None` becomes the literal "None", which won’t match any runtime and is hard to debug. Instead, pass the raw value (e.g. `booter=provider_settings.get("sandbox", {}).get("booter")`) and handle `None` inside `_get_runtime_computer_tools`, or explicitly normalize falsy values to `""` there.
Suggested implementation:
```python
runtime_computer_tools = cls._get_runtime_computer_tools(
runtime,
tool_mgr,
provider_settings.get("sandbox", {}).get("booter"),
)
```
To fully implement the suggestion, you should also update `_get_runtime_computer_tools` in the same file (or wherever it is defined) to:
1. Accept a possibly-None `booter` parameter without assuming it is a string.
2. Normalize/handle `booter` internally, for example:
- If `booter` is `None` or another falsy value, treat it as the default (e.g. `""` or a default booter constant).
- Otherwise ensure it is in the expected format (e.g. `str(booter)` if string is required, after the `None` check).
3. Ensure any runtime selection logic uses this normalized value instead of relying on `str(None) == "None"`.
These changes will avoid the accidental "None" string and keep the normalization logic localized to `_get_runtime_computer_tools`.
</issue_to_address>
### Comment 4
<location path="astrbot/core/computer/booters/cua.py" line_range="60" />
<code_context>
+ return ""
+
+
+class CuaShellComponent(ShellComponent):
+ def __init__(self, sandbox: Any) -> None:
+ self._sandbox = sandbox
</code_context>
<issue_to_address>
**issue (complexity):** Consider resolving sandbox method variants and filesystem capabilities once during construction and delegating via small adapters instead of doing per-call duck-typing and branching.
You can localize most of the duck-typing without losing the flexibility by doing the method resolution once at construction time and by introducing tiny adapters.
### 1) Resolve `run` vs `exec` once instead of `_call_first`
Instead of calling `_call_first` on every invocation, choose the method once in `__init__` and keep the rest of the logic unchanged:
```python
class CuaShellComponent(ShellComponent):
def __init__(self, sandbox: Any) -> None:
self._sandbox = sandbox
shell = sandbox.shell
self._shell_exec = getattr(shell, "run", None) or getattr(shell, "exec", None)
if self._shell_exec is None:
raise RuntimeError("CUA shell must provide `run` or `exec`")
async def exec(...):
...
result = await self._shell_exec(command, **kwargs)
payload = _maybe_model_dump(result)
...
```
Similarly for `CuaPythonComponent`:
```python
class CuaPythonComponent(PythonComponent):
def __init__(self, sandbox: Any) -> None:
self._sandbox = sandbox
python = getattr(sandbox, "python", None)
if python is not None:
self._python_exec = getattr(python, "run", None) or getattr(python, "exec", None)
else:
self._python_exec = None
async def exec(...):
_ = kernel_id
if self._python_exec is not None:
result = await self._python_exec(code, timeout=timeout)
payload = _maybe_model_dump(result)
else:
shell = CuaShellComponent(self._sandbox)
...
```
You can then remove `_call_first` entirely. This keeps compatibility with old/new SDKs but eliminates per-call introspection and branching.
### 2) Small filesystem adapter to centralize capability resolution
The per-method `hasattr` branching in `CuaFileSystemComponent` (especially `delete_file` and `list_dir`) can be moved into a tiny adapter built once:
```python
class CuaFilesystemAdapter:
def __init__(self, sandbox: Any, shell: CuaShellComponent) -> None:
self._fs = getattr(sandbox, "filesystem", None)
self._shell = shell
self._delete = None
self._list_dir = None
if self._fs is not None:
if hasattr(self._fs, "delete"):
self._delete = self._fs.delete
elif hasattr(self._fs, "delete_file"):
self._delete = self._fs.delete_file
if hasattr(self._fs, "list_dir"):
self._list_dir = self._fs.list_dir
async def delete(self, path: str) -> None:
if self._delete is not None:
await self._delete(path)
else:
await self._shell.exec(f"rm -rf {path!r}")
async def list_dir(self, path: str, show_hidden: bool) -> dict[str, Any]:
if self._list_dir is not None:
entries = await self._list_dir(path)
return {"success": True, "path": path, "entries": entries}
flags = "-la" if show_hidden else "-l"
result = await self._shell.exec(f"ls {flags} {path!r}")
return {
"success": not bool(result.get("stderr")),
"path": path,
"entries": result.get("stdout", ""),
"error": result.get("stderr", ""),
}
```
Then `CuaFileSystemComponent` uses this adapter:
```python
class CuaFileSystemComponent(FileSystemComponent):
def __init__(self, sandbox: Any) -> None:
self._sandbox = sandbox
self._shell = CuaShellComponent(sandbox)
self._adapter = CuaFilesystemAdapter(sandbox, self._shell)
async def delete_file(self, path: str) -> dict[str, Any]:
await self._adapter.delete(path)
return {"success": True, "path": path}
async def list_dir(self, path: str = ".", show_hidden: bool = False) -> dict[str, Any]:
return await self._adapter.list_dir(path, show_hidden)
```
This keeps all existing fallbacks (filesystem vs shell, `delete` vs `delete_file`, etc.) but removes repeated nested conditionals and runtime introspection from each call site, making the component methods much easier to read.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Code Review
This pull request introduces support for the CUA (Computer Use Agent) sandbox booter, enabling desktop GUI automation capabilities. It adds new tools for taking screenshots, clicking, and typing within the sandbox, along with the necessary booter implementation, configuration options, and unit tests. The review feedback suggests improving the robustness of the Python execution heredoc delimiter and ensuring consistency in the directory listing output format.
| payload = _maybe_model_dump(result) | ||
| else: | ||
| shell = CuaShellComponent(self._sandbox) | ||
| result = await shell.exec(f"python3 - <<'PY'\n{code}\nPY", timeout=timeout) |
There was a problem hiding this comment.
The heredoc delimiter PY is too short and common, which could lead to premature termination of the Python code if the string PY appears at the beginning of a line within the code block. Using a more unique and descriptive delimiter (e.g., _ASTRBOT_PYTHON_CODE_) would be safer.
| result = await shell.exec(f"python3 - <<'PY'\n{code}\nPY", timeout=timeout) | |
| result = await shell.exec(f"python3 - <<'_ASTRBOT_PYTHON_CODE_'\n{code}\n_ASTRBOT_PYTHON_CODE_", timeout=timeout) |
| flags = "-la" if show_hidden else "-l" | ||
| result = await self._shell.exec(f"ls {flags} {path!r}") | ||
| return { | ||
| "success": not bool(result.get("stderr")), | ||
| "path": path, | ||
| "entries": result.get("stdout", ""), | ||
| "error": result.get("stderr", ""), | ||
| } |
There was a problem hiding this comment.
The list_dir fallback implementation returns a formatted string from ls -l, whereas the SDK-based branch (line 281) likely returns a list of entries. This inconsistency can cause issues for callers expecting a structured list. It's better to use ls -1 and split the output into a list of filenames to maintain consistency with the protocol.
| flags = "-la" if show_hidden else "-l" | |
| result = await self._shell.exec(f"ls {flags} {path!r}") | |
| return { | |
| "success": not bool(result.get("stderr")), | |
| "path": path, | |
| "entries": result.get("stdout", ""), | |
| "error": result.get("stderr", ""), | |
| } | |
| flags = "-1a" if show_hidden else "-1" | |
| result = await self._shell.exec(f"ls {flags} {path!r}") | |
| stdout = result.get("stdout", "") | |
| entries = [line.strip() for line in stdout.splitlines() if line.strip()] | |
| return { | |
| "success": not bool(result.get("stderr")), | |
| "path": path, | |
| "entries": entries, | |
| "error": result.get("stderr", ""), | |
| } |
|
@sourcery-ai review |
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The helper
_call_firstin the CUA booter assumes all target methods are async and awaitable; to make this more robust against different CUA SDK versions, consider detecting and handling synchronous callables (e.g., viainspect.iscoroutinefunction/iscoroutine) instead of alwaysawait-ing. - The shell-based fallback implementation of
CuaFileSystemComponent.list_dirreturns rawls -llines asentries, which may be hard for downstream tools to consume as filenames; you might want to normalize this to a more structured or filename-only format to keep behavior closer to the filesystem-backed variant.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The helper `_call_first` in the CUA booter assumes all target methods are async and awaitable; to make this more robust against different CUA SDK versions, consider detecting and handling synchronous callables (e.g., via `inspect.iscoroutinefunction` / `iscoroutine`) instead of always `await`-ing.
- The shell-based fallback implementation of `CuaFileSystemComponent.list_dir` returns raw `ls -l` lines as `entries`, which may be hard for downstream tools to consume as filenames; you might want to normalize this to a more structured or filename-only format to keep behavior closer to the filesystem-backed variant.
## Individual Comments
### Comment 1
<location path="astrbot/core/computer/booters/cua.py" line_range="162" />
<code_context>
+ }
+
+
+class CuaFileSystemComponent(FileSystemComponent):
+ def __init__(self, sandbox: Any) -> None:
+ self._sandbox = sandbox
</code_context>
<issue_to_address>
**issue (complexity):** Consider centralising the filesystem backend branching and replacing the generic `_call_first` helper with explicit shell/python adapters to make control flow clearer and easier to follow.
You can reduce a fair bit of cognitive load in this file by localising the “multi-path” logic instead of spreading it across multiple public methods and helpers.
### 1. Centralise filesystem backend selection
`CuaFileSystemComponent` repeats the same “fs vs shell” branching in `read_file`, `write_file`, `delete_file`, and `list_dir`. Pulling this into a couple of small internal helpers keeps behaviour but makes each public method linear and easier to read.
For example:
```python
class CuaFileSystemComponent(FileSystemComponent):
...
async def _fs_read_text(
self,
path: str,
encoding: str = "utf-8",
) -> str:
fs = self._filesystem
if fs is not None and hasattr(fs, "read_file"):
content = await fs.read_file(path)
else:
result = await self._shell.exec(f"cat {path!r}")
if result.get("stderr"):
raise RuntimeError(result["stderr"])
content = result.get("stdout", "")
if isinstance(content, bytes):
return content.decode(encoding, errors="replace")
return str(content)
async def _fs_write_text(
self,
path: str,
content: str,
encoding: str = "utf-8",
) -> None:
fs = self._filesystem
if fs is not None and hasattr(fs, "write_file"):
await fs.write_file(path, content)
return
encoded = base64.b64encode(content.encode(encoding)).decode()
await self._shell.exec(f"base64 -d > {path!r} <<'EOF'\n{encoded}\nEOF")
async def _fs_delete(self, path: str) -> None:
fs = self._filesystem
if fs is None:
await self._shell.exec(f"rm -rf {path!r}")
return
if hasattr(fs, "delete"):
await fs.delete(path)
elif hasattr(fs, "delete_file"):
await fs.delete_file(path)
else:
await self._shell.exec(f"rm -rf {path!r}")
async def _fs_list(self, path: str, show_hidden: bool) -> dict[str, Any]:
fs = self._filesystem
if fs is not None and hasattr(fs, "list_dir"):
entries = await fs.list_dir(path)
return {"success": True, "path": path, "entries": entries}
flags = "-la" if show_hidden else "-l"
result = await self._shell.exec(f"ls {flags} {path!r}")
return {
"success": not bool(result.get("stderr")),
"path": path,
"entries": _split_listing_entries(result.get("stdout", "")),
"error": result.get("stderr", ""),
}
```
Then the public methods become straightforward, without changing semantics:
```python
async def read_file(...):
try:
content = await self._fs_read_text(path, encoding=encoding)
except RuntimeError as exc:
return {"success": False, "path": path, "error": str(exc)}
return {
"success": True,
"path": path,
"content": _slice_content_by_lines(
content, offset=offset, limit=limit
),
}
async def write_file(...):
await self._fs_write_text(path, content, encoding=encoding)
return {"success": True, "path": path}
async def delete_file(self, path: str) -> dict[str, Any]:
await self._fs_delete(path)
return {"success": True, "path": path}
async def list_dir(...):
return await self._fs_list(path, show_hidden)
```
This keeps all current fallbacks (FS API, shell, base64 path) but makes each method’s control flow much more obvious.
### 2. Replace `_call_first` for shell/python with explicit CUA adapters
For `CuaShellComponent` and `CuaPythonComponent`, `_call_first` is quite generic and obscures what shapes you actually support (`run` vs `exec`). You can keep the multi-version support but move it into small, typed adapters, which also makes error modes clearer:
```python
class CuaShellComponent(ShellComponent):
...
async def _run_shell(
self,
command: str,
**kwargs: Any,
) -> Any:
shell = self._sandbox.shell
if hasattr(shell, "run"):
return await shell.run(command, **kwargs)
if hasattr(shell, "exec"):
return await shell.exec(command, **kwargs)
raise AttributeError("sandbox.shell has neither 'run' nor 'exec'")
```
and then:
```python
result = await self._run_shell(command, **kwargs)
payload = _maybe_model_dump(result)
...
```
Similarly for `CuaPythonComponent`:
```python
async def _run_python(self, code: str, timeout: int) -> dict[str, Any]:
python = getattr(self._sandbox, "python", None)
if python is not None:
if hasattr(python, "run"):
result = await python.run(code, timeout=timeout)
elif hasattr(python, "exec"):
result = await python.exec(code, timeout=timeout)
else:
raise AttributeError("sandbox.python has neither 'run' nor 'exec'")
return _maybe_model_dump(result)
shell = CuaShellComponent(self._sandbox)
result = await shell.exec(f"python3 - <<'PY'\n{code}\nPY", timeout=timeout)
return {
"output": result.get("stdout", ""),
"error": result.get("stderr", ""),
}
```
`exec` then just becomes:
```python
async def exec(...):
payload = await self._run_python(code, timeout)
...
```
These changes keep compatibility with multiple CUA SDK versions, but make the contracts per component explicit and reduce the need to mentally follow generic helper logic.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
@sourcery-ai review |
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- In the CuaGUIComponent methods (click/type_text), accessing sandbox.mouse/keyboard directly will raise AttributeError if the SDK object is missing those attributes; consider guarding with hasattr or a clearer error message so misconfigured CUA SDKs fail more gracefully.
- The shell-based base64 upload/write logic in CuaFileSystemComponent.write_file and CuaBooter.upload_file is duplicated; extracting a shared helper for this pattern would reduce repetition and keep the behavior consistent if you need to tweak the shell pipeline later.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In the CuaGUIComponent methods (click/type_text), accessing sandbox.mouse/keyboard directly will raise AttributeError if the SDK object is missing those attributes; consider guarding with hasattr or a clearer error message so misconfigured CUA SDKs fail more gracefully.
- The shell-based base64 upload/write logic in CuaFileSystemComponent.write_file and CuaBooter.upload_file is duplicated; extracting a shared helper for this pattern would reduce repetition and keep the behavior consistent if you need to tweak the shell pipeline later.
## Individual Comments
### Comment 1
<location path="tests/unit/test_cua_computer_use.py" line_range="103-112" />
<code_context>
+ ] is False
+
+
+@pytest.mark.asyncio
+async def test_screenshot_tool_returns_image_and_sends_file(monkeypatch, tmp_path):
+ from astrbot.core.tools.computer_tools import cua as cua_tools
+ from astrbot.core.tools.computer_tools.cua import CuaScreenshotTool
</code_context>
<issue_to_address>
**suggestion (testing):** Add tests for admin-permission failures in CUA tools to cover the negative path of `check_admin_permission`.
Currently these tools are only tested for the success case where `check_admin_permission` returns `None`. Please add tests where `check_admin_permission` returns an error (e.g., monkeypatch it to return a `ToolExecResult` or string) and assert that the tools exit early without calling `_get_gui_component` or sending messages/files. This will verify that permission checks are actually enforced and guard against regressions.
</issue_to_address>
### Comment 2
<location path="astrbot/core/computer/booters/cua.py" line_range="70" />
<code_context>
+ return [line for line in output.splitlines() if line.strip()]
+
+
+class CuaShellComponent(ShellComponent):
+ def __init__(self, sandbox: Any) -> None:
+ self._sandbox = sandbox
</code_context>
<issue_to_address>
**issue (complexity):** Consider refactoring the CUA booter into smaller adapters and environment objects so that normalization, filesystem variants, and screenshot handling are centralized instead of being inlined with conditionals throughout the file.
You can keep the same feature set while making this file easier to reason about by pushing most of the “defensive” logic into small, focused adapters and normalizers, instead of sprinkling conditionals at every call site.
Below are concrete refactors that preserve functionality but reduce cognitive load.
---
### 1. Centralize `shell` / `python` normalization instead of using `_call_first` everywhere
Right now `CuaShellComponent` and `CuaPythonComponent` both have to know about `run` vs `exec` and various payload shapes. You can normalize this once, at sandbox-adapter level, and then keep the components simple.
For example:
```python
class CuaSandboxAdapter:
def __init__(self, sandbox: Any) -> None:
self._sandbox = sandbox
self.shell = self._wrap_shell(getattr(sandbox, "shell", None))
self.python = self._wrap_python(getattr(sandbox, "python", None))
self.filesystem = getattr(sandbox, "filesystem", None)
self.screenshot = getattr(sandbox, "screenshot", None)
self.upload_file = getattr(sandbox, "upload_file", None)
self.download_file = getattr(sandbox, "download_file", None)
def _wrap_shell(self, shell: Any) -> Any:
if shell is None:
return None
async def run(command: str, **kwargs: Any) -> dict[str, Any]:
raw = await _call_first(shell, ("run", "exec"), command, **kwargs)
payload = _maybe_model_dump(raw)
if not payload and isinstance(raw, str):
payload = {"stdout": raw}
return {
"stdout": _result_text(payload, "stdout", "output"),
"stderr": _result_text(payload, "stderr", "error"),
"exit_code": payload.get(
"exit_code", payload.get("returncode", 0 if not payload.get("stderr") else 1)
),
"success": bool(
payload.get("success", not payload.get("stderr") and payload.get("exit_code") in (0, None))
),
}
return type("ShellAdapter", (), {"run": run})
def _wrap_python(self, python: Any) -> Any:
if python is None:
return None
async def run(code: str, timeout: int) -> dict[str, Any]:
raw = await _call_first(python, ("run", "exec"), code, timeout=timeout)
return _maybe_model_dump(raw)
return type("PythonAdapter", (), {"run": run})
```
Then `CuaShellComponent.exec` becomes much more straightforward:
```python
class CuaShellComponent(ShellComponent):
def __init__(self, adapter: CuaSandboxAdapter) -> None:
self._adapter = adapter
async def exec(...):
if not shell:
return { ... }
kwargs = {...}
if self._adapter.shell is None:
return { "stdout": "", "stderr": "shell not available", "exit_code": 1, "success": False }
result = await self._adapter.shell.run(command, **kwargs)
# background-specific post-processing stays here
...
```
Similarly, `CuaPythonComponent` can just rely on `adapter.python.run`, and only implement the “fallback via shell” branch.
This keeps the “run vs exec / output vs stdout / returncode vs exit_code” logic in **one place** instead of being re-applied in each component.
---
### 2. Split filesystem into two implementations instead of branching in every method
`CuaFileSystemComponent` currently has multiple branches per method (SDK vs shell, multiple method names). You can instead choose an implementation once in `__init__` and delegate.
For example:
```python
class CuaFSViaSDK(FileSystemComponent):
def __init__(self, fs: Any) -> None:
self._fs = fs
async def read_file(self, path: str, **kwargs: Any) -> dict[str, Any]:
content = await self._fs.read_file(path)
return {
"success": True,
"path": path,
"content": _slice_content_by_lines(str(content), **kwargs),
}
async def write_file(self, path: str, content: str, **kwargs: Any) -> dict[str, Any]:
await self._fs.write_file(path, content)
return {"success": True, "path": path}
async def delete_file(self, path: str) -> dict[str, Any]:
if hasattr(self._fs, "delete"):
await self._fs.delete(path)
else:
await self._fs.delete_file(path)
return {"success": True, "path": path}
async def list_dir(self, path: str = ".", show_hidden: bool = False) -> dict[str, Any]:
entries = await self._fs.list_dir(path)
return {"success": True, "path": path, "entries": entries}
```
```python
class CuaFSViaShell(FileSystemComponent):
def __init__(self, shell: CuaShellComponent) -> None:
self._shell = shell
async def read_file(...):
result = await self._shell.exec(f"cat {path!r}")
...
async def write_file(...):
...
async def delete_file(...):
...
async def list_dir(...):
...
```
Then `CuaFileSystemComponent` becomes a thin delegator:
```python
class CuaFileSystemComponent(FileSystemComponent):
def __init__(self, sandbox_adapter: CuaSandboxAdapter) -> None:
if sandbox_adapter.filesystem is not None:
self._impl: FileSystemComponent = CuaFSViaSDK(sandbox_adapter.filesystem)
else:
shell = CuaShellComponent(sandbox_adapter)
self._impl = CuaFSViaShell(shell)
async def read_file(self, *args, **kwargs):
return await self._impl.read_file(*args, **kwargs)
async def write_file(self, *args, **kwargs):
return await self._impl.write_file(*args, **kwargs)
async def delete_file(self, *args, **kwargs):
return await self._impl.delete_file(*args, **kwargs)
async def list_dir(self, *args, **kwargs):
return await self._impl.list_dir(*args, **kwargs)
```
This preserves all current behavior but removes 2–3 branches from each public filesystem method.
---
### 3. Narrow `_screenshot_to_bytes` by explicit format handlers
The current `_screenshot_to_bytes` recursively handles many shapes. You can make this more legible by separating the recognition and conversion logic.
For example:
```python
def _screenshot_to_bytes(raw: Any) -> bytes:
# 1) direct bytes-like
if isinstance(raw, (bytes, bytearray)):
return bytes(raw)
# 2) known dict payloads
payload = _maybe_model_dump(raw)
for key in ("data", "base64", "image"):
if key in payload:
return _screenshot_to_bytes(payload[key])
# 3) string formats (data URL / base64 / file path)
if isinstance(raw, str):
return _screenshot_str_to_bytes(raw)
# 4) PIL-like objects
if hasattr(raw, "save"):
return _screenshot_image_to_bytes(raw)
raise TypeError(f"Unsupported CUA screenshot result: {type(raw)!r}")
def _screenshot_str_to_bytes(value: str) -> bytes:
if value.startswith("data:image"):
value = value.split(",", 1)[1]
try:
return base64.b64decode(value, validate=True)
except Exception:
path = Path(value)
if path.is_file():
return path.read_bytes()
return value.encode("utf-8")
def _screenshot_image_to_bytes(image: Any) -> bytes:
import io
buf = io.BytesIO()
image.save(buf, format="PNG")
return buf.getvalue()
```
This keeps the same supported shapes but flattens the control flow and makes it easier to see “what formats we actually support” at a glance.
---
### 4. Replace nullable booter attributes with a small environment object
Instead of tracking `_sandbox`, `_sandbox_cm`, `_shell`, `_python`, `_fs`, `_gui` separately, you can group them in a single struct-like object whose presence/absence encodes the state. This reduces the number of nullable fields and manual resets.
```python
@dataclass
class CuaEnvironment:
sandbox_cm: Any
sandbox: Any
shell: CuaShellComponent
python: CuaPythonComponent
fs: CuaFileSystemComponent
gui: CuaGUIComponent | None
```
```python
class CuaBooter(ComputerBooter):
def __init__(...):
...
self._env: CuaEnvironment | None = None
async def boot(self, session_id: str) -> None:
...
sandbox_cm = Sandbox.ephemeral(image_obj, **ephemeral_kwargs)
sandbox = await sandbox_cm.__aenter__()
adapter = CuaSandboxAdapter(sandbox)
env = CuaEnvironment(
sandbox_cm=sandbox_cm,
sandbox=sandbox,
shell=CuaShellComponent(adapter),
python=CuaPythonComponent(adapter),
fs=CuaFileSystemComponent(adapter),
gui=CuaGUIComponent(sandbox),
)
self._env = env
...
async def shutdown(self) -> None:
if self._env is None:
return
await self._env.sandbox_cm.__aexit__(None, None, None)
self._env = None
@property
def fs(self) -> FileSystemComponent:
if self._env is None:
raise RuntimeError("CuaBooter is not initialized.")
return self._env.fs
# same for shell/python/gui
```
This keeps the lifecycle semantics unchanged but collapses the state space from many nullable attributes to a single “env present or not” check.
---
All of these suggestions preserve your current compatibility shims while making the code more modular and easier to audit. You can adopt them incrementally (e.g., start with the filesystem split or the screenshot refactor) without changing the external `ComputerBooter` contract.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
@sourcery-ai review |
There was a problem hiding this comment.
Hey - I've found 3 issues, and left some high level feedback:
- In
CuaBooter.capabilitiesyou always advertise GUI/mouse/keyboard support, butCuaGUIComponentcan raise at runtime if the underlying sandbox lacks those devices; consider deriving capabilities dynamically from the initialized sandbox (e.g., presence ofmouse,keyboard,screenshot) to avoid mismatches between reported and actual features. - Helpers like
_write_base64_via_shellassumebase64 -dis available, whilecua_image/cua_os_typecan be non-Linux; it may be safer to either gate these shell fallbacks to Linux images or choose a more portable approach (or OS-specific flags) so macOS/Windows images don’t fail on basic file operations.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `CuaBooter.capabilities` you always advertise GUI/mouse/keyboard support, but `CuaGUIComponent` can raise at runtime if the underlying sandbox lacks those devices; consider deriving capabilities dynamically from the initialized sandbox (e.g., presence of `mouse`, `keyboard`, `screenshot`) to avoid mismatches between reported and actual features.
- Helpers like `_write_base64_via_shell` assume `base64 -d` is available, while `cua_image`/`cua_os_type` can be non-Linux; it may be safer to either gate these shell fallbacks to Linux images or choose a more portable approach (or OS-specific flags) so macOS/Windows images don’t fail on basic file operations.
## Individual Comments
### Comment 1
<location path="astrbot/core/computer/booters/cua.py" line_range="27" />
<code_context>
+ data: bytes,
+) -> dict[str, Any]:
+ encoded = base64.b64encode(data).decode("ascii")
+ return await shell.exec(f"base64 -d > {path!r} <<'EOF'\n{encoded}\nEOF")
+
+
</code_context>
<issue_to_address>
**issue (bug_risk):** Consider using a more portable base64 decoding strategy for non-Linux images (e.g., macOS).
Hardcoding `base64 -d` assumes GNU coreutils and will fail on BSD/macOS, which use `-D`. Since `cua_os_type` includes `macos`, this will break file writes on those sandboxes. Consider detecting the supported flag, using a portable approach like `python -m base64 -d`, or using the CUA filesystem API instead of a shell-specific command.
</issue_to_address>
### Comment 2
<location path="astrbot/core/astr_agent_tool_exec.py" line_range="192-195" />
<code_context>
cls,
runtime: str,
tool_mgr,
+ booter: str | None = None,
) -> dict[str, FunctionTool]:
+ booter = "" if booter is None else str(booter)
if runtime == "sandbox":
shell_tool = tool_mgr.get_builtin_tool(ExecuteShellTool)
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Normalize the booter name to avoid case-sensitivity issues when enabling CUA tools.
Since `booter` is only coerced to `str` before `if booter == "cua":`, passing values like "CUA" or "Cua" would skip registering the extra tools. Consider normalizing, e.g. `booter = "" if booter is None else str(booter).lower()`, so case variants still enable CUA tools while preserving current behavior.
```suggestion
booter: str | None = None,
) -> dict[str, FunctionTool]:
booter = "" if booter is None else str(booter).lower()
if runtime == "sandbox":
```
</issue_to_address>
### Comment 3
<location path="astrbot/core/computer/booters/cua.py" line_range="94" />
<code_context>
+ return method
+
+
+class CuaShellComponent(ShellComponent):
+ def __init__(self, sandbox: Any) -> None:
+ self._sandbox = sandbox
</code_context>
<issue_to_address>
**issue (complexity):** Consider centralizing process-result and shell-call normalization into shared helpers so components don’t each re-implement the same introspective logic.
You can cut a fair bit of cognitive load by centralizing the “shape‑normalization” instead of sprinkling generic helpers across components.
### 1. Normalize process results once via a small dataclass
Right now both `CuaShellComponent.exec` and `CuaPythonComponent.exec` call:
- `_call_first`
- `_maybe_model_dump`
- `_result_text`
- custom fallback logic
You can keep all the compatibility behavior but move it into a single normalization function + dataclass that every component uses.
Example:
```python
from dataclasses import dataclass
@dataclass
class ProcessResult:
stdout: str
stderr: str
exit_code: int | None
success: bool
def _normalize_process_result(raw: Any) -> ProcessResult:
payload = _maybe_model_dump(raw)
# Allow pure string fallback (e.g. shell returns str)
if not payload and isinstance(raw, str):
payload = {"stdout": raw}
stdout = _result_text(payload, "stdout", "output")
stderr = _result_text(payload, "stderr", "error")
exit_code = payload.get("exit_code", payload.get("returncode", 0 if not stderr else 1))
success = bool(payload.get("success", not stderr and exit_code in (0, None)))
return ProcessResult(stdout=stdout, stderr=stderr, exit_code=exit_code, success=success)
```
Then `CuaShellComponent.exec` becomes more declarative and composable:
```python
async def exec(...):
...
result = await _call_first(self._sandbox.shell, ("run", "exec"), command, **kwargs)
proc = _normalize_process_result(result)
response: dict[str, Any] = {
"stdout": proc.stdout,
"stderr": proc.stderr,
"exit_code": proc.exit_code,
"success": proc.success,
}
if background:
try:
response["pid"] = int(proc.stdout.strip().splitlines()[-1])
except Exception:
response["pid"] = None
return response
```
And `CuaPythonComponent.exec` can reuse the same normalization, reducing duplicated, introspective logic:
```python
async def exec(...):
_ = kernel_id
python = getattr(self._sandbox, "python", None)
if python is not None:
raw = await _call_first(python, ("run", "exec"), code, timeout=timeout)
proc = _normalize_process_result(raw)
else:
shell = CuaShellComponent(self._sandbox)
shell_result = await shell.exec(f"python3 - <<'PY'\n{code}\nPY", timeout=timeout)
# shell_result is already normalized; just map fields
proc = ProcessResult(
stdout=shell_result.get("stdout", ""),
stderr=shell_result.get("stderr", ""),
exit_code=shell_result.get("exit_code"),
success=shell_result.get("success", False),
)
output_text = "" if silent else proc.stdout
error_text = proc.stderr
return {
"success": proc.success if not silent else bool(not error_text),
"data": {
"output": {"text": output_text, "images": []},
"error": error_text,
},
"output": output_text,
"error": error_text,
}
```
This keeps all the existing flexibility (`stdout` vs `output`, `returncode` vs `exit_code`, string results, etc.) but:
- Consumers no longer need to know about the different shapes.
- The normalization logic is in one place, making it easier to audit and extend.
### 2. Consider a small shell adapter to avoid `_call_first` at call sites
If you want to retain support for both `run` and `exec` on the CUA shell, you can still avoid scattering `_call_first` by adapting once:
```python
class _ShellAdapter:
def __init__(self, shell: Any) -> None:
if hasattr(shell, "run"):
self._exec = shell.run
elif hasattr(shell, "exec"):
self._exec = shell.exec
else:
raise AttributeError("Shell has neither `run` nor `exec`")
async def exec(self, *args: Any, **kwargs: Any) -> Any:
return await _maybe_await(self._exec(*args, **kwargs))
```
Then `CuaShellComponent` can wrap the CUA shell in this adapter in `__init__` and call a single `exec`, instead of `_call_first`:
```python
class CuaShellComponent(ShellComponent):
def __init__(self, sandbox: Any) -> None:
self._sandbox = sandbox
self._shell_adapter = _ShellAdapter(sandbox.shell)
async def exec(...):
...
result = await self._shell_adapter.exec(command, **kwargs)
proc = _normalize_process_result(result)
...
```
This preserves the current behavior (try `run`, then `exec`) but makes the control flow much easier to follow in the main component methods.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
@sourcery-ai review |
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- The CUA shell/background execution helpers (e.g.
_build_cua_background_commandand thesh -lcwrapper) assume a POSIX environment withpython3andsh, which may not hold forcua_os_typevalues likewindowsorandroid; consider branching onos_typeor detecting capabilities to avoid breaking non-Linux images. - Error handling is inconsistent between the CUA GUI tools and the updated shell tool:
CuaScreenshotTool/CuaMouseClickTool/CuaKeyboardTypeToolreturnstr(e)directly, which can be blank; you might want to reuse thedetail = str(e) or type(e).__name__pattern to ensure useful error messages for blank exception messages.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The CUA shell/background execution helpers (e.g. `_build_cua_background_command` and the `sh -lc` wrapper) assume a POSIX environment with `python3` and `sh`, which may not hold for `cua_os_type` values like `windows` or `android`; consider branching on `os_type` or detecting capabilities to avoid breaking non-Linux images.
- Error handling is inconsistent between the CUA GUI tools and the updated shell tool: `CuaScreenshotTool`/`CuaMouseClickTool`/`CuaKeyboardTypeTool` return `str(e)` directly, which can be blank; you might want to reuse the `detail = str(e) or type(e).__name__` pattern to ensure useful error messages for blank exception messages.
## Individual Comments
### Comment 1
<location path="astrbot/core/computer/booters/cua.py" line_range="162" />
<code_context>
+ kwargs["timeout"] = timeout
+ if env:
+ kwargs["env"] = env
+ if background:
+ command = _build_cua_background_command(command)
+
</code_context>
<issue_to_address>
**issue (bug_risk):** Background launcher never surfaces a PID, so the returned `pid` field is effectively always `None`.
In `CuaShellComponent.exec`, when `background` is true, the command is wrapped by `_build_cua_background_command`, which routes `stdout` to `DEVNULL` and never emits a PID. As a result, `proc.stdout` is always empty and PID parsing always fails, so `pid` is effectively unusable. Either update the launcher to print the PID (before redirecting/exit) or stop populating `pid` and document that background processes are fire-and-forget; the current behavior is misleading for callers.
</issue_to_address>
### Comment 2
<location path="astrbot/core/computer/booters/cua.py" line_range="235" />
<code_context>
+ }
+
+
+class CuaFileSystemComponent(FileSystemComponent):
+ def __init__(self, sandbox: Any) -> None:
+ self._sandbox = sandbox
</code_context>
<issue_to_address>
**issue (complexity):** Consider centralizing the dynamic filesystem handling and booter runtime state into dedicated adapter/runtime structures to simplify control flow and lifecycle management.
You can reduce a fair amount of complexity without losing functionality by centralizing some of the dynamic behavior. Two spots where this will pay off quickly:
---
### 1. Normalize filesystem strategy once instead of per-call `hasattr` chains
`CuaFileSystemComponent` currently branches on `self._filesystem` and `hasattr` in every method (`read_file`, `write_file`, `delete_file`, `list_dir`), which spreads compatibility logic throughout the class.
You can instead decide on a concrete strategy in `__init__` and delegate to it, keeping the public behavior the same but making the code easier to read and change.
```python
class CuaFileSystemComponent(FileSystemComponent):
def __init__(self, sandbox: Any) -> None:
self._sandbox = sandbox
fs = getattr(sandbox, "filesystem", None)
shell = CuaShellComponent(sandbox)
if fs is not None:
self._impl = _NativeFSAdapter(fs=fs, shell=shell)
else:
self._impl = _ShellFSAdapter(shell=shell)
async def create_file(self, path: str, content: str = "", mode: int = 0o644) -> dict[str, Any]:
return await self._impl.create_file(path, content, mode)
async def read_file(self, path: str, encoding: str = "utf-8",
offset: int | None = None, limit: int | None = None) -> dict[str, Any]:
return await self._impl.read_file(path, encoding=encoding, offset=offset, limit=limit)
# ...forward other methods similarly...
```
Adapters encapsulate the messy compatibility logic, but are small and focused:
```python
class _NativeFSAdapter:
def __init__(self, fs: Any, shell: CuaShellComponent) -> None:
self._fs = fs
self._shell = shell
async def read_file(self, path: str, encoding: str = "utf-8",
offset: int | None = None, limit: int | None = None) -> dict[str, Any]:
content = await self._fs.read_file(path)
if isinstance(content, bytes):
content = content.decode(encoding, errors="replace")
return {
"success": True,
"path": path,
"content": _slice_content_by_lines(str(content), offset=offset, limit=limit),
}
# implement write_file/delete_file/list_dir using existing logic,
# but without `if fs is not None and hasattr(...)` checks
```
```python
class _ShellFSAdapter:
def __init__(self, shell: CuaShellComponent) -> None:
self._shell = shell
async def read_file(self, path: str, encoding: str = "utf-8",
offset: int | None = None, limit: int | None = None) -> dict[str, Any]:
result = await self._shell.exec(f"cat {path!r}")
if result.get("stderr"):
return {"success": False, "path": path, "error": result["stderr"]}
content = result.get("stdout", "")
return {
"success": True,
"path": path,
"content": _slice_content_by_lines(str(content), offset=offset, limit=limit),
}
# implement write_file/delete_file/list_dir with the existing shell fallbacks
```
This keeps behavior intact but removes repeated introspection and `hasattr` chains, making the main component easier to follow.
---
### 2. Encapsulate booter state in a non-nullable dataclass
`CuaBooter` currently carries `_sandbox`, `_sandbox_cm`, `_shell`, `_python`, `_fs`, `_gui` as nullable attributes, with runtime checks in each property. You can maintain the external API but reduce the internal nullable state by grouping boot-time state into a single dataclass.
```python
@dataclass(slots=True)
class _CuaRuntime:
sandbox_cm: Any
sandbox: Any
shell: CuaShellComponent
python: CuaPythonComponent
fs: CuaFileSystemComponent
gui: CuaGUIComponent | None
```
Then adjust `CuaBooter` to hold exactly one `_runtime: _CuaRuntime | None`:
```python
class CuaBooter(ComputerBooter):
def __init__(...):
...
self._runtime: _CuaRuntime | None = None
async def boot(self, session_id: str) -> None:
...
sandbox_cm = Sandbox.ephemeral(image_obj, **ephemeral_kwargs)
sandbox = await sandbox_cm.__aenter__()
runtime = _CuaRuntime(
sandbox_cm=sandbox_cm,
sandbox=sandbox,
shell=CuaShellComponent(sandbox),
python=CuaPythonComponent(sandbox),
fs=CuaFileSystemComponent(sandbox),
gui=CuaGUIComponent(sandbox),
)
self._runtime = runtime
logger.info("...")
async def shutdown(self) -> None:
if self._runtime is None:
return
await self._runtime.sandbox_cm.__aexit__(None, None, None)
self._runtime = None
@property
def shell(self) -> ShellComponent:
if self._runtime is None:
raise RuntimeError("CuaBooter is not initialized.")
return self._runtime.shell
# python/fs/gui properties similarly use self._runtime
```
This preserves current semantics (still raises if not booted) but collapses multiple nullable fields into one explicit “booted vs not booted” state, making the life-cycle easier to reason about and reducing scattered `None` checks.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
@sourcery-ai review |
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- In
ExecuteShellTool.call, theenv: dict = {}default argument is mutable and shared across invocations; consider changing this toenv: Optional[dict] = Noneand normalizing inside the method to avoid accidental cross-call state leakage. - The CUA shell background implementation always wraps commands with
python3 -c ...; if CUA images are not guaranteed to havepython3installed, you may want to either guard for that or document/validate this requirement to avoid confusing runtime failures.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `ExecuteShellTool.call`, the `env: dict = {}` default argument is mutable and shared across invocations; consider changing this to `env: Optional[dict] = None` and normalizing inside the method to avoid accidental cross-call state leakage.
- The CUA shell background implementation always wraps commands with `python3 -c ...`; if CUA images are not guaranteed to have `python3` installed, you may want to either guard for that or document/validate this requirement to avoid confusing runtime failures.
## Individual Comments
### Comment 1
<location path="astrbot/core/computer/booters/cua.py" line_range="293-294" />
<code_context>
+ ) -> dict[str, Any]:
+ _ = mode
+ write_file = getattr(self._fs, "write_file", None)
+ if write_file is None:
+ await _write_base64_via_shell(self._shell, path, content.encode(encoding))
+ else:
+ await _maybe_await(write_file(path, content))
</code_context>
<issue_to_address>
**issue (bug_risk):** Errors from `_write_base64_via_shell` are ignored, so file write failures may be reported as successful.
Both `_NativeFSAdapter.write_file` and `_ShellFSAdapter.write_file` await `_write_base64_via_shell` but discard its result and always return `{"success": True, ...}`. If the shell command fails (e.g., missing `python3`, permission/path issues), callers still see a successful write. Please either (1) check the returned dict and propagate `success`/`stderr`, or (2) raise on failure so callers can handle it. The same consideration applies anywhere `_write_base64_via_shell` is used as a fallback, to keep filesystem failures visible for debugging.
</issue_to_address>
### Comment 2
<location path="astrbot/core/computer/booters/cua.py" line_range="245" />
<code_context>
+ }
+
+
+class _NativeFSAdapter:
+ def __init__(self, fs: Any, shell: CuaShellComponent) -> None:
+ self._fs = fs
</code_context>
<issue_to_address>
**issue (complexity):** Consider unifying the two filesystem adapter classes into a single adapter that chooses between native fs and shell at call time to reduce duplication and indirection.
You can keep all current functionality but reduce complexity noticeably by collapsing the duplicated FS adapters into a single implementation. Right now `_NativeFSAdapter` and `_ShellFSAdapter` have almost identical APIs and a lot of shared logic; you can replace them with a single, “fs-or-shell” implementation, reducing one layer of indirection without changing behavior.
### 1. Replace `_NativeFSAdapter` / `_ShellFSAdapter` with a single adapter
You can keep the behavior “prefer native fs if present, else use shell” but encode that in a single class:
```python
class _CuaFSAdapter:
def __init__(self, fs: Any | None, shell: CuaShellComponent) -> None:
self._fs = fs
self._shell = shell
async def create_file(
self,
path: str,
content: str = "",
mode: int = 0o644,
) -> dict[str, Any]:
await self.write_file(path, content)
return {"success": True, "path": path, "mode": mode}
async def read_file(
self,
path: str,
encoding: str = "utf-8",
offset: int | None = None,
limit: int | None = None,
) -> dict[str, Any]:
content: str | bytes
if self._fs is not None and hasattr(self._fs, "read_file"):
content = await _maybe_await(self._fs.read_file(path))
if isinstance(content, bytes):
content = content.decode(encoding, errors="replace")
else:
result = await self._shell.exec(f"cat {path!r}")
if result.get("stderr"):
return {"success": False, "path": path, "error": result["stderr"]}
content = result.get("stdout", "")
return {
"success": True,
"path": path,
"content": _slice_content_by_lines(
str(content), offset=offset, limit=limit
),
}
async def write_file(
self,
path: str,
content: str,
mode: str = "w",
encoding: str = "utf-8",
) -> dict[str, Any]:
_ = mode
if self._fs is not None and hasattr(self._fs, "write_file"):
await _maybe_await(self._fs.write_file(path, content))
else:
await _write_base64_via_shell(self._shell, path, content.encode(encoding))
return {"success": True, "path": path}
async def delete_file(self, path: str) -> dict[str, Any]:
delete = None
if self._fs is not None:
delete = getattr(self._fs, "delete", None) or getattr(
self._fs, "delete_file", None
)
if delete is not None:
await _maybe_await(delete(path))
else:
await self._shell.exec(f"rm -rf {path!r}")
return {"success": True, "path": path}
async def list_dir(
self,
path: str = ".",
show_hidden: bool = False,
) -> dict[str, Any]:
if self._fs is not None and hasattr(self._fs, "list_dir"):
entries = await _maybe_await(self._fs.list_dir(path))
return {"success": True, "path": path, "entries": entries}
return await _list_dir_via_shell(self._shell, path, show_hidden)
```
### 2. Simplify `CuaFileSystemComponent` to use the single adapter
Then the component becomes thinner and easier to reason about:
```python
class CuaFileSystemComponent(FileSystemComponent):
def __init__(self, sandbox: Any, os_type: str = "linux") -> None:
self._shell = CuaShellComponent(sandbox, os_type=os_type)
fs = getattr(sandbox, "filesystem", None)
self._impl = _CuaFSAdapter(fs, self._shell)
async def create_file(
self,
path: str,
content: str = "",
mode: int = 0o644,
) -> dict[str, Any]:
return await self._impl.create_file(path, content, mode)
async def read_file(
self,
path: str,
encoding: str = "utf-8",
offset: int | None = None,
limit: int | None = None,
) -> dict[str, Any]:
return await self._impl.read_file(path, encoding, offset, limit)
async def write_file(
self,
path: str,
content: str,
mode: str = "w",
encoding: str = "utf-8",
) -> dict[str, Any]:
return await self._impl.write_file(path, content, mode, encoding)
async def delete_file(self, path: str) -> dict[str, Any]:
return await self._impl.delete_file(path)
async def list_dir(
self,
path: str = ".",
show_hidden: bool = False,
) -> dict[str, Any]:
return await self._impl.list_dir(path, show_hidden)
```
This preserves:
- the “prefer native filesystem, fallback to shell” behavior,
- the same public API and return shapes,
- the `_write_base64_via_shell` and `_list_dir_via_shell` paths,
while removing two parallel adapter classes and one extra abstraction layer, making the file materially easier to follow.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
@sourcery-ai review |
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- CUA-specific defaults (image/os_type/ttl/telemetry/local/api_key) are currently duplicated between
CONFIG_METADATA_3/default provider settings andCuaBooter.__init__; consider centralizing these in a single config helper or constructor to avoid them drifting out of sync. - Several shell/file helper flows rely on
python3being present inside the sandbox (e.g.,_write_base64_via_shell, background launcher, CuaPythonComponent fallback), but only the background path surfaces a tailored error; it may be worth detectingpython3availability once and reusing a shared error/guard to give more consistent, early feedback across these code paths.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- CUA-specific defaults (image/os_type/ttl/telemetry/local/api_key) are currently duplicated between `CONFIG_METADATA_3`/default provider settings and `CuaBooter.__init__`; consider centralizing these in a single config helper or constructor to avoid them drifting out of sync.
- Several shell/file helper flows rely on `python3` being present inside the sandbox (e.g., `_write_base64_via_shell`, background launcher, CuaPythonComponent fallback), but only the background path surfaces a tailored error; it may be worth detecting `python3` availability once and reusing a shared error/guard to give more consistent, early feedback across these code paths.
## Individual Comments
### Comment 1
<location path="astrbot/core/computer/booters/cua.py" line_range="346-355" />
<code_context>
+ return await _list_dir_via_shell(self._shell, path, show_hidden)
+
+
+async def _list_dir_via_shell(
+ shell: CuaShellComponent,
+ path: str,
+ show_hidden: bool,
+) -> dict[str, Any]:
+ flags = "-1A" if show_hidden else "-1"
+ result = await shell.exec(f"ls {flags} {path!r}")
+ return {
+ "success": not bool(result.get("stderr")),
+ "path": path,
+ "entries": _split_listing_entries(result.get("stdout", "")),
+ "error": result.get("stderr", ""),
+ }
</code_context>
<issue_to_address>
**issue (bug_risk):** Shell-based fs operations assume POSIX tools and will likely fail on Windows CUA images.
Both `_list_dir_via_shell` and `_CuaFSAdapter.delete_file` hardcode `ls` / `rm -rf`, so `list_dir` and `delete_file` will fail on CUA images with `os_type="windows"`. Since `CuaBooter` already tracks `os_type`, consider either branching to appropriate PowerShell/cmd commands for Windows, or explicitly rejecting non-POSIX images with a clear error, consistent with how `_os_type` is used for background shell execution.
</issue_to_address>
### Comment 2
<location path="astrbot/core/computer/booters/cua.py" line_range="263" />
<code_context>
+ return {"success": True, "path": path, **result}
+
+
+class _CuaFSAdapter:
+ def __init__(self, fs: Any | None, shell: CuaShellComponent) -> None:
+ self._fs = fs
</code_context>
<issue_to_address>
**issue (complexity):** Consider simplifying the filesystem and background-shell logic by inlining the adapter into `CuaFileSystemComponent` and extracting the background launcher script into a readable constant/helper.
You can trim some of the indirection and “generic” helpers without losing functionality. Two concrete places where complexity can be reduced:
---
### 1. Remove the `_CuaFSAdapter` double abstraction
`CuaFileSystemComponent` mostly forwards directly to `_CuaFSAdapter`. You can inline the adapter logic into `CuaFileSystemComponent` and keep only a minimal helper for the shell fallbacks.
Today:
```python
class CuaFileSystemComponent(FileSystemComponent):
def __init__(self, sandbox: Any, os_type: str = "linux") -> None:
self._shell = CuaShellComponent(sandbox, os_type=os_type)
fs = getattr(sandbox, "filesystem", None)
self._impl = _CuaFSAdapter(fs, self._shell)
async def create_file(...):
return await self._impl.create_file(...)
async def read_file(...):
return await self._impl.read_file(...)
...
```
You can drop `_CuaFSAdapter` and keep the same behavior by moving its methods into `CuaFileSystemComponent`:
```python
class CuaFileSystemComponent(FileSystemComponent):
def __init__(self, sandbox: Any, os_type: str = "linux") -> None:
self._shell = CuaShellComponent(sandbox, os_type=os_type)
self._fs = getattr(sandbox, "filesystem", None)
async def create_file(
self,
path: str,
content: str = "",
mode: int = 0o644,
) -> dict[str, Any]:
write_result = await self.write_file(path, content)
if not write_result.get("success"):
return {**write_result, "mode": mode}
return {"success": True, "path": path, "mode": mode}
async def read_file(
self,
path: str,
encoding: str = "utf-8",
offset: int | None = None,
limit: int | None = None,
) -> dict[str, Any]:
read_file = None if self._fs is None else getattr(self._fs, "read_file", None)
if read_file is None:
result = await self._shell.exec(f"cat {path!r}")
if result.get("stderr"):
return {"success": False, "path": path, "error": result["stderr"]}
content = result.get("stdout", "")
else:
content = await _maybe_await(read_file(path))
if isinstance(content, bytes):
content = content.decode(encoding, errors="replace")
return {
"success": True,
"path": path,
"content": _slice_content_by_lines(
str(content), offset=offset, limit=limit
),
}
async def write_file(
self,
path: str,
content: str,
mode: str = "w",
encoding: str = "utf-8",
) -> dict[str, Any]:
_ = mode
write_file = None if self._fs is None else getattr(self._fs, "write_file", None)
if write_file is None:
result = await _write_base64_via_shell(
self._shell, path, content.encode(encoding)
)
return _write_result(path, result)
await _maybe_await(write_file(path, content))
return {"success": True, "path": path}
async def delete_file(self, path: str) -> dict[str, Any]:
delete = None
if self._fs is not None:
delete = getattr(self._fs, "delete", None) or getattr(
self._fs, "delete_file", None
)
if delete is None:
await self._shell.exec(f"rm -rf {path!r}")
else:
await _maybe_await(delete(path))
return {"success": True, "path": path}
async def list_dir(
self,
path: str = ".",
show_hidden: bool = False,
) -> dict[str, Any]:
list_dir = None if self._fs is None else getattr(self._fs, "list_dir", None)
if list_dir is not None:
entries = await _maybe_await(list_dir(path))
return {"success": True, "path": path, "entries": entries}
return await _list_dir_via_shell(self._shell, path, show_hidden)
```
This removes an entire class and one level of dispatch while preserving the same shell vs. filesystem behavior.
---
### 2. Encapsulate the background launcher as a constant / helper
The inline string in `_build_cua_background_command` is hard to read and audit. You can move the Python launcher into a clearly named constant or function so that `exec` logic is easier to follow.
Today:
```python
def _build_cua_background_command(command: str) -> str:
launcher = (
"import subprocess,sys,time; "
'p=subprocess.Popen(["sh","-lc",sys.argv[1]], '
"stdin=subprocess.DEVNULL, stdout=subprocess.DEVNULL, "
"stderr=subprocess.DEVNULL, start_new_session=True); "
"sys.stdout.write(str(p.pid)+chr(10)); sys.stdout.flush(); "
"time.sleep(0.2); "
"code=p.poll(); "
"sys.exit(0 if code is None else code)"
)
return f"python3 -c {shlex.quote(launcher)} {shlex.quote(command)}"
```
Refactor to a multi-line constant and simple builder:
```python
_CUA_BG_LAUNCHER = """
import subprocess, sys, time
p = subprocess.Popen(
["sh", "-lc", sys.argv[1]],
stdin=subprocess.DEVNULL,
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
start_new_session=True,
)
sys.stdout.write(str(p.pid) + "\\n")
sys.stdout.flush()
time.sleep(0.2)
code = p.poll()
sys.exit(0 if code is None else code)
""".strip()
def _build_cua_background_command(command: str) -> str:
return f"python3 -c {shlex.quote(_CUA_BG_LAUNCHER)} {shlex.quote(command)}"
```
Behavior is identical, but the logic is now readable, easier to test in isolation, and the `exec` method carries less “mystery string” complexity.
---
These two changes directly address the concerns about extra adapter layers and dense embedded scripts, while keeping all current behavior and flexibility intact.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Hey - I've found 5 issues
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location path="astrbot/core/computer/booters/cua.py" line_range="322-329" />
<code_context>
+ await _maybe_await(write_file(path, content))
+ return {"success": True, "path": path}
+
+ async def delete_file(self, path: str) -> dict[str, Any]:
+ delete = None
+ if self._fs is not None:
+ delete = getattr(self._fs, "delete", None) or getattr(
+ self._fs, "delete_file", None
+ )
+ if delete is None:
+ await self._shell.exec(f"rm -rf {path!r}")
+ else:
+ await _maybe_await(delete(path))
</code_context>
<issue_to_address>
**issue (bug_risk):** File deletion falls back to POSIX-specific `rm -rf`, which will fail on Windows CUA images.
In `_CuaFSAdapter.delete_file`, the shell fallback uses `rm -rf`, which is POSIX-only. For CUA configs with `os_type="windows"`, this will not work. Please either branch on `os_type` in `CuaShellComponent` to issue a Windows-appropriate delete command, or route all deletions through a filesystem abstraction that hides OS-specific details.
</issue_to_address>
### Comment 2
<location path="astrbot/core/computer/booters/cua.py" line_range="346-352" />
<code_context>
+ return await _list_dir_via_shell(self._shell, path, show_hidden)
+
+
+async def _list_dir_via_shell(
+ shell: CuaShellComponent,
+ path: str,
+ show_hidden: bool,
+) -> dict[str, Any]:
+ flags = "-1A" if show_hidden else "-1"
+ result = await shell.exec(f"ls {flags} {path!r}")
+ return {
+ "success": not bool(result.get("stderr")),
</code_context>
<issue_to_address>
**issue (bug_risk):** Directory listing uses `ls`, which is POSIX-only and not aligned with declared Windows/Android support.
`_list_dir_via_shell` always calls `ls`, so any fallback to shell on Windows/Android will fail. Please either branch on `cua_os_type` (e.g., `dir` on Windows) or avoid shell-based listing and rely on a portable CUA filesystem method where possible.
</issue_to_address>
### Comment 3
<location path="astrbot/core/computer/booters/cua.py" line_range="452-461" />
<code_context>
+ def __init__(self, sandbox: Any) -> None:
+ self._sandbox = sandbox
+
+ async def screenshot(self, path: str | None = None) -> dict[str, Any]:
+ raw = await self._sandbox.screenshot()
+ data = _screenshot_to_bytes(raw)
+ if path:
+ Path(path).parent.mkdir(parents=True, exist_ok=True)
+ Path(path).write_bytes(data)
+ return {
+ "success": True,
+ "path": path,
+ "mime_type": "image/png",
+ "base64": base64.b64encode(data).decode("ascii"),
+ }
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Screenshot tool always reports `mime_type` as `image/png` regardless of the underlying format.
In `CuaGUIComponent.screenshot`, `mime_type` is always set to `image/png`. If the CUA SDK ever returns non-PNG data (e.g., JPEG) or already specifies a type, this will misrepresent the content to consumers. Please either derive the MIME type from the underlying data/SDK metadata or confirm and document that the SDK always returns PNG before hardcoding this value.
Suggested implementation:
```python
class CuaGUIComponent(GUIComponent):
def __init__(self, sandbox: Any) -> None:
self._sandbox = sandbox
async def screenshot(self, path: str | None = None) -> dict[str, Any]:
raw = await self._sandbox.screenshot()
data = _screenshot_to_bytes(raw)
mime_type: str | None = None
# Prefer MIME type from the SDK, if available
if isinstance(raw, dict):
mime_type = raw.get("mime_type") or raw.get("content_type")
elif hasattr(raw, "mime_type"):
mime_type = getattr(raw, "mime_type")
# Fallback to simple content sniffing if SDK did not specify a type
if not mime_type:
img_kind = imghdr.what(None, data)
if img_kind:
mime_type = f"image/{img_kind}"
else:
# Final fallback if we cannot determine the type
mime_type = "application/octet-stream"
if path:
p = Path(path)
p.parent.mkdir(parents=True, exist_ok=True)
p.write_bytes(data)
return {
"success": True,
"path": path,
"mime_type": mime_type,
"base64": base64.b64encode(data).decode("ascii"),
}
```
To fully integrate this change, you should also:
1. Ensure `imghdr` is imported at the top of `astrbot/core/computer/booters/cua.py`, for example:
`import imghdr`
2. Confirm that `Path`, `base64`, and `_screenshot_to_bytes` are already imported/defined in this module. If not, add:
- `from pathlib import Path`
- `import base64`
- Ensure `_screenshot_to_bytes` is implemented or imported from the appropriate utility module.
3. Optionally, add a brief docstring/comment on `CuaGUIComponent.screenshot` documenting that the MIME type is derived from the SDK when present, and otherwise inferred from the image data, with a final fallback to `application/octet-stream`.
</issue_to_address>
### Comment 4
<location path="astrbot/core/computer/booters/cua.py" line_range="605-608" />
<code_context>
+ kwargs["api_key"] = self.api_key
+ return kwargs
+
+ async def shutdown(self) -> None:
+ if self._runtime is not None:
+ await self._runtime.sandbox_cm.__aexit__(None, None, None)
+ self._runtime = None
+
+ @property
</code_context>
<issue_to_address>
**suggestion (bug_risk):** `shutdown` does not guarantee `_runtime` is cleared if `__aexit__` raises.
Because `_runtime` is only cleared after `__aexit__` completes, an exception from `__aexit__` leaves `_runtime` non-`None` while the sandbox is partially torn down. This can cause follow‑up calls to operate on an invalid runtime. Consider wrapping `__aexit__` in `try/finally` and setting `_runtime = None` in the `finally` block to ensure consistent state.
```suggestion
async def shutdown(self) -> None:
if self._runtime is not None:
runtime = self._runtime
try:
await runtime.sandbox_cm.__aexit__(None, None, None)
finally:
self._runtime = None
```
</issue_to_address>
### Comment 5
<location path="astrbot/core/computer/booters/cua.py" line_range="17" />
<code_context>
+from .shipyard_search_file_util import search_files_via_shell
+
+
+async def _maybe_await(value: Any) -> Any:
+ if inspect.isawaitable(value):
+ return await value
</code_context>
<issue_to_address>
**issue (complexity):** Consider simplifying the CUA booter by specializing process/result handling, inlining the filesystem adapter, and centralizing shell-based FS fallbacks to tighten abstractions and reduce indirection.
You can reduce complexity without changing behavior by tightening a few abstractions and removing one layer of indirection.
---
### 1. Specialize process result handling instead of over‑generic helpers
Right now each shell/python call flows through `_call_first` → `_normalize_process_result` → `_result_text` → `_maybe_model_dump`, which makes tracing behavior difficult.
You can replace these with a single, CUA‑specific normalizer and drop some generic helpers:
```python
@dataclass(slots=True)
class CuaProcessResult:
stdout: str
stderr: str
exit_code: int | None
success: bool
def _to_cua_process_result(raw: Any) -> CuaProcessResult:
# CUA SDK returns either a dict-like payload or an object with attributes
if isinstance(raw, dict):
payload = raw
elif hasattr(raw, "model_dump"):
payload = raw.model_dump()
elif hasattr(raw, "dict"):
payload = raw.dict()
else:
# fallback to attribute access; adjust to actual CUA types
payload = {
"stdout": getattr(raw, "stdout", ""),
"stderr": getattr(raw, "stderr", ""),
"exit_code": getattr(raw, "exit_code", None),
"success": getattr(raw, "success", None),
}
stdout = str(payload.get("stdout", payload.get("output", "")))
stderr = str(payload.get("stderr", payload.get("error", "")))
exit_code = payload.get("exit_code", payload.get("returncode"))
if exit_code is None:
exit_code = 0 if not stderr else 1
success = bool(payload.get("success", not stderr and exit_code in (0, None)))
return CuaProcessResult(stdout=stdout, stderr=stderr, exit_code=exit_code, success=success)
```
Then `CuaShellComponent.exec` and `CuaPythonComponent.exec` can be simplified:
```python
# before
result = await _call_first(self._sandbox.shell, ("run", "exec"), command, **kwargs)
proc = _normalize_process_result(result)
# after
result = await self._sandbox.shell.run(command, **kwargs) # or a single known method
proc = _to_cua_process_result(result)
```
And in `CuaPythonComponent`:
```python
if python is not None:
result = await python.run(code, timeout=timeout)
proc = _to_cua_process_result(result)
else:
shell = CuaShellComponent(self._sandbox, os_type=self._os_type)
result = await shell.exec(f"python3 - <<'PY'\n{code}\nPY", timeout=timeout)
proc = _to_cua_process_result(result)
```
This lets you delete `_result_text`, `_normalize_process_result`, and (if you standardize on specific method names) `_call_first`, while still handling all existing payload shapes.
If you still need method fallback for version compatibility, centralize it:
```python
def _resolve_shell_method(shell: Any) -> Any:
for name in ("run", "exec"):
method = getattr(shell, name, None)
if method is not None:
return method
raise RuntimeError("CUA shell has neither `run` nor `exec`.")
async def _run_shell(shell: Any, command: str, **kwargs: Any) -> Any:
method = _resolve_shell_method(shell)
return await _maybe_await(method(command, **kwargs))
```
and use `_run_shell` everywhere instead of open‑coded `_call_first`.
---
### 2. Remove the extra `_CuaFSAdapter` layer
`CuaFileSystemComponent` is mostly a thin proxy to `_CuaFSAdapter`, which mirrors the same methods. You can inline `_CuaFSAdapter` logic into `CuaFileSystemComponent` to remove one level of indirection and make the code path easier to follow.
For example, instead of:
```python
class CuaFileSystemComponent(FileSystemComponent):
def __init__(self, sandbox: Any, os_type: str = "linux") -> None:
self._shell = CuaShellComponent(sandbox, os_type=os_type)
fs = getattr(sandbox, "filesystem", None)
self._impl = _CuaFSAdapter(fs, self._shell)
async def read_file(...):
return await self._impl.read_file(path, encoding, offset, limit)
```
you can inline the adapter:
```python
class CuaFileSystemComponent(FileSystemComponent):
def __init__(self, sandbox: Any, os_type: str = "linux") -> None:
self._shell = CuaShellComponent(sandbox, os_type=os_type)
self._fs = getattr(sandbox, "filesystem", None)
async def read_file(
self,
path: str,
encoding: str = "utf-8",
offset: int | None = None,
limit: int | None = None,
) -> dict[str, Any]:
read_file = getattr(self._fs, "read_file", None) if self._fs is not None else None
if read_file is None:
result = await self._shell.exec(f"cat {path!r}")
if result.get("stderr"):
return {"success": False, "path": path, "error": result["stderr"]}
content = result.get("stdout", "")
else:
content = await _maybe_await(read_file(path))
if isinstance(content, bytes):
content = content.decode(encoding, errors="replace")
return {
"success": True,
"path": path,
"content": _slice_content_by_lines(str(content), offset=offset, limit=limit),
}
```
Similarly inline `create_file`, `write_file`, `delete_file`, and `list_dir`. This keeps the existing dual path (native FS + shell fallback) but removes an extra class and indirection per call.
If you anticipate reusing this FS adapter in other booters, you could alternatively:
- Move `_CuaFSAdapter` to a shared `cua_fs_adapter.py` module.
- Rename `CuaFileSystemComponent._impl` to something more explicit (e.g. `_fs_adapter`) to make the layering clearer.
---
### 3. Isolate shell‑based file fallbacks
Shell fallbacks (`_write_base64_via_shell`, `_list_dir_via_shell`, `base64` downloads) are currently mixed into both `CuaFileSystemComponent` and `CuaBooter.upload_file/download_file`. You can keep all behavior but move the shell‑specific logic behind a tiny helper to reduce repetition and cross‑cutting concerns.
For example, create `cua_shell_fs.py`:
```python
# cua_shell_fs.py
async def shell_write_file(shell: ShellComponent, path: str, data: bytes) -> dict[str, Any]:
return await _write_base64_via_shell(shell, path, data)
async def shell_read_dir(shell: ShellComponent, path: str, show_hidden: bool) -> dict[str, Any]:
flags = "-1A" if show_hidden else "-1"
result = await shell.exec(f"ls {flags} {path!r}")
return {
"success": not bool(result.get("stderr")),
"path": path,
"entries": _split_listing_entries(result.get("stdout", "")),
"error": result.get("stderr", ""),
}
async def shell_download_file(shell: ShellComponent, remote_path: str) -> bytes:
result = await shell.exec(f"base64 {remote_path!r}")
if result.get("stderr"):
raise RuntimeError(result["stderr"])
return base64.b64decode(result.get("stdout", ""))
```
Then use these helpers from both `CuaFileSystemComponent` and `CuaBooter.download_file`/`upload_file` instead of open‑coding the commands in multiple places. This keeps the shell fallback strategy but localizes shell command construction and base64 handling.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| async def screenshot(self, path: str | None = None) -> dict[str, Any]: | ||
| raw = await self._sandbox.screenshot() | ||
| data = _screenshot_to_bytes(raw) | ||
| if path: | ||
| Path(path).parent.mkdir(parents=True, exist_ok=True) | ||
| Path(path).write_bytes(data) | ||
| return { | ||
| "success": True, | ||
| "path": path, | ||
| "mime_type": "image/png", |
There was a problem hiding this comment.
suggestion (bug_risk): Screenshot tool always reports mime_type as image/png regardless of the underlying format.
In CuaGUIComponent.screenshot, mime_type is always set to image/png. If the CUA SDK ever returns non-PNG data (e.g., JPEG) or already specifies a type, this will misrepresent the content to consumers. Please either derive the MIME type from the underlying data/SDK metadata or confirm and document that the SDK always returns PNG before hardcoding this value.
Suggested implementation:
class CuaGUIComponent(GUIComponent):
def __init__(self, sandbox: Any) -> None:
self._sandbox = sandbox
async def screenshot(self, path: str | None = None) -> dict[str, Any]:
raw = await self._sandbox.screenshot()
data = _screenshot_to_bytes(raw)
mime_type: str | None = None
# Prefer MIME type from the SDK, if available
if isinstance(raw, dict):
mime_type = raw.get("mime_type") or raw.get("content_type")
elif hasattr(raw, "mime_type"):
mime_type = getattr(raw, "mime_type")
# Fallback to simple content sniffing if SDK did not specify a type
if not mime_type:
img_kind = imghdr.what(None, data)
if img_kind:
mime_type = f"image/{img_kind}"
else:
# Final fallback if we cannot determine the type
mime_type = "application/octet-stream"
if path:
p = Path(path)
p.parent.mkdir(parents=True, exist_ok=True)
p.write_bytes(data)
return {
"success": True,
"path": path,
"mime_type": mime_type,
"base64": base64.b64encode(data).decode("ascii"),
}To fully integrate this change, you should also:
- Ensure
imghdris imported at the top ofastrbot/core/computer/booters/cua.py, for example:
import imghdr - Confirm that
Path,base64, and_screenshot_to_bytesare already imported/defined in this module. If not, add:from pathlib import Pathimport base64- Ensure
_screenshot_to_bytesis implemented or imported from the appropriate utility module.
- Optionally, add a brief docstring/comment on
CuaGUIComponent.screenshotdocumenting that the MIME type is derived from the SDK when present, and otherwise inferred from the image data, with a final fallback toapplication/octet-stream.
|
@sourcery-ai review |
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- The
_is_self_detached_commandheuristic inExecuteShellToolis quite narrow (onlynohup/trailing&), so you may want to expand or document its intended scope (e.g., handling cases likecmd & # comment, chained commands, or platform-specific detachers) to avoid misclassifying already-detached commands. - In
CuaFileSystemComponent.create_file, themodeargument is returned but not actually applied (no chmod or equivalent), so either enforce the mode on creation or clarify in the behaviour thatmodeis informational only to avoid misleading callers. - The GUI-related error messages in
CuaGUIComponent(and_require_component_method) are hard-coded to specific attribute names (e.g.,keyboard.press), even though multiple alternatives are tried; consider making the message reflect the actual methods probed or listing them to make debugging missing CUA SDK capabilities clearer.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `_is_self_detached_command` heuristic in `ExecuteShellTool` is quite narrow (only `nohup`/trailing `&`), so you may want to expand or document its intended scope (e.g., handling cases like `cmd & # comment`, chained commands, or platform-specific detachers) to avoid misclassifying already-detached commands.
- In `CuaFileSystemComponent.create_file`, the `mode` argument is returned but not actually applied (no chmod or equivalent), so either enforce the mode on creation or clarify in the behaviour that `mode` is informational only to avoid misleading callers.
- The GUI-related error messages in `CuaGUIComponent` (and `_require_component_method`) are hard-coded to specific attribute names (e.g., `keyboard.press`), even though multiple alternatives are tried; consider making the message reflect the actual methods probed or listing them to make debugging missing CUA SDK capabilities clearer.
## Individual Comments
### Comment 1
<location path="astrbot/core/computer/booters/cua.py" line_range="219-228" />
<code_context>
+ ) -> dict[str, Any]:
+ read_file = None if self._fs is None else getattr(self._fs, "read_file", None)
+ if read_file is None:
+ if not _is_posix_os_type(self._os_type):
+ return _non_posix_filesystem_result(path, self._os_type)
+ result = await self._shell.exec(f"cat {path!r}")
+ if result.get("stderr"):
+ return {"success": False, "path": path, "error": result["stderr"]}
</code_context>
<issue_to_address>
**issue (bug_risk):** Use shell-safe quoting for paths instead of Python repr
In `read_file` (and similarly `delete_file` and `_list_dir_via_shell`), `path!r` is not shell-safe and will break for paths with spaces, quotes, or other special characters, and may behave differently across OSes. This can cause files to be missed or the wrong paths to be used.
Use `shlex.quote(path)` when interpolating into shell commands, e.g.:
```python
quoted = shlex.quote(path)
result = await self._shell.exec(f"cat {quoted}")
```
and likewise for the `rm -rf` and `ls` calls to avoid subtle path-handling bugs.
</issue_to_address>
### Comment 2
<location path="astrbot/core/computer/booters/cua.py" line_range="189" />
<code_context>
+ return getattr(component, method_name, None) is not None
+
+
+class CuaShellComponent(ShellComponent):
+ def __init__(self, sandbox: Any, os_type: str = "linux") -> None:
+ self._sandbox = sandbox
</code_context>
<issue_to_address>
**issue (complexity):** Consider resolving the `run`/`exec` polymorphism once in component constructors so the exec paths no longer perform dynamic dispatch on every call.
You can keep all behavior but significantly flatten the hot paths by resolving polymorphism once in `__init__` instead of on every call.
### 1. Resolve `run`/`exec` once per component (remove `_call_first` in hot paths)
Right now `CuaShellComponent.exec` and `CuaPythonComponent.exec` both pay the cost of `_call_first(..., ("run", "exec"), ...)` on every call. You can resolve the right method once and store it:
```python
class CuaShellComponent(ShellComponent):
def __init__(self, sandbox: Any, os_type: str = "linux") -> None:
self._sandbox = sandbox
self._os_type = os_type.lower()
shell = sandbox.shell
# Resolve once, fail loudly if neither exists
self._exec_raw = getattr(shell, "exec", None) or getattr(shell, "run", None)
if self._exec_raw is None:
raise RuntimeError(
"CUA sandbox shell must provide `.exec` or `.run`."
)
async def exec(
self,
command: str,
cwd: str | None = None,
env: dict[str, str] | None = None,
timeout: int | None = 30,
shell: bool = True,
background: bool = False,
) -> dict[str, Any]:
...
# instead of _call_first(self._sandbox.shell, ("run", "exec"), ...)
result = await _maybe_await(self._exec_raw(command, **kwargs))
proc = _normalize_process_result(result)
...
```
Similarly for `CuaPythonComponent`:
```python
class CuaPythonComponent(PythonComponent):
def __init__(self, sandbox: Any, os_type: str = "linux") -> None:
self._sandbox = sandbox
self._os_type = os_type
python = getattr(sandbox, "python", None)
if python is not None:
self._python_exec = (
getattr(python, "exec", None) or getattr(python, "run", None)
)
else:
self._python_exec = None # signals shell fallback
async def exec(...):
if self._python_exec is not None:
result = await _maybe_await(self._python_exec(code, timeout=timeout))
proc = _normalize_process_result(result)
else:
# shell fallback unchanged
shell = CuaShellComponent(self._sandbox, os_type=self._os_type)
...
```
This keeps all existing behavior but:
- Removes repeated `getattr`/`_call_first` logic from the hot paths.
- Makes the “normal” path for shell/python easy to follow.
- Localizes the polymorphism to the constructor where you can also improve error messages if needed.
Once this is in place, `_call_first` is only needed in a few non-hot or more dynamic cases (or can be dropped entirely if those sites are similarly resolved up front).
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
@sourcery-ai review |
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- In
computer_client.get_booter, the CUA branch logsapi_keyas part of the config line; this leaks secrets to logs and should be changed to either omit the key entirely or log only redacted/boolean information about its presence. - The updated
AstrBotConfig.check_config_integritylogging now omits the config path entirely when inserting/removing defaults, which can make troubleshooting harder; consider logging the path/name without the value so that secrets stay hidden but operators still know which entries were modified.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `computer_client.get_booter`, the CUA branch logs `api_key` as part of the config line; this leaks secrets to logs and should be changed to either omit the key entirely or log only redacted/boolean information about its presence.
- The updated `AstrBotConfig.check_config_integrity` logging now omits the config path entirely when inserting/removing defaults, which can make troubleshooting harder; consider logging the path/name without the value so that secrets stay hidden but operators still know which entries were modified.
## Individual Comments
### Comment 1
<location path="astrbot/core/computer/booters/cua.py" line_range="366" />
<code_context>
+ if write_file is None:
+ if not _is_posix_os_type(self._os_type):
+ return _non_posix_filesystem_result(path, self._os_type)
+ result = await _write_base64_via_shell(
+ self._shell, path, content.encode(encoding)
+ )
</code_context>
<issue_to_address>
**issue (bug_risk):** Uploading files via shell fallback may break for non-POSIX CUA images.
Here, if `upload_file` is missing, we always fall back to `_write_base64_via_shell`, which assumes a POSIX shell and `python3` in the sandbox. On non-POSIX images (e.g., Windows CUA), this will likely fail at runtime. It would be better to mirror the other filesystem methods by using `_is_posix_os_type` / `_non_posix_filesystem_result`, or otherwise add an explicit capability check and clear error instead of relying on this implicit POSIX-only path.
</issue_to_address>
### Comment 2
<location path="astrbot/core/computer/booters/cua.py" line_range="682" />
<code_context>
+ if sandbox is not None and hasattr(sandbox, "download_file"):
+ await sandbox.download_file(remote_path, local_path)
+ return
+ result = await self.shell.exec(f"base64 {remote_path!r}")
+ if result.get("stderr"):
+ raise RuntimeError(result["stderr"])
</code_context>
<issue_to_address>
**🚨 suggestion (security):** Use shell-quoting instead of `!r` when invoking `base64` for downloads.
Using `repr` to interpolate `remote_path` into a shell command is unsafe and brittle, especially for paths containing quotes or shell metacharacters. Please construct the command with proper shell quoting instead, e.g. `shlex.quote(remote_path)`, to align with the rest of the module and avoid injection issues.
Suggested implementation:
```python
result = await self.shell.exec(f"base64 {shlex.quote(remote_path)}")
```
Ensure that `shlex` is imported at the top of `astrbot/core/computer/booters/cua.py`, e.g.:
```python
import shlex
```
If the module already imports `shlex`, no further changes are needed.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
@sourcery-ai review |
|
@sourcery-ai review |
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The CuaBooter.download_file shell fallback always uses
base64without the same POSIX / python3 capability checks you added elsewhere, so it may be worth aligning this path with the other filesystem/shell fallbacks (e.g., OS-type guard and clearer error messaging) to avoid confusing failures on non-POSIX images. - Several helpers encode similar logic for reporting missing
python3or shell-based filesystem fallbacks (e.g.,_python3_requirement_error, filesystem write/read fallbacks, background launcher); consider centralizing this behavior to reduce the chance of drift and keep error messages consistent across CUA components.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The CuaBooter.download_file shell fallback always uses `base64` without the same POSIX / python3 capability checks you added elsewhere, so it may be worth aligning this path with the other filesystem/shell fallbacks (e.g., OS-type guard and clearer error messaging) to avoid confusing failures on non-POSIX images.
- Several helpers encode similar logic for reporting missing `python3` or shell-based filesystem fallbacks (e.g., `_python3_requirement_error`, filesystem write/read fallbacks, background launcher); consider centralizing this behavior to reduce the chance of drift and keep error messages consistent across CUA components.
## Individual Comments
### Comment 1
<location path="tests/unit/test_cua_computer_use.py" line_range="124-129" />
<code_context>
+ assert sent_messages == []
+
+
+@pytest.mark.asyncio
+async def test_cua_tools_include_exception_type_for_blank_error(monkeypatch):
+ from astrbot.core.tools.computer_tools import cua as cua_tools
+ from astrbot.core.tools.computer_tools.cua import CuaMouseClickTool
+
+ class BlankError(Exception):
+ def __str__(self):
+ return ""
+
+ class FakeEvent:
+ unified_msg_origin = "umo"
+ role = "admin"
+
+ class FakeAstrContext:
+ event = FakeEvent()
+ context = FakeContext(
+ {"provider_settings": {"computer_use_require_admin": True}}
+ )
+
+ class FakeWrapper:
+ context = FakeAstrContext()
+
+ async def fail_gui_lookup(context):
+ raise BlankError()
+
+ monkeypatch.setattr(cua_tools, "_get_gui_component", fail_gui_lookup)
+
+ assert await CuaMouseClickTool().call(FakeWrapper(), x=1, y=2) == (
+ "Error clicking CUA desktop: BlankError"
+ )
</code_context>
<issue_to_address>
**suggestion (testing):** Add happy-path tests for CUA mouse and keyboard tools to ensure correct wiring to the GUI component.
Current tests emphasize error handling, but none cover successful execution of `CuaMouseClickTool` or `CuaKeyboardTypeTool` through `_get_gui_component` to the underlying GUI component. Please add happy-path tests (e.g., injecting a fake GUI with `click`/`type_text`, asserting `_get_gui_component` is used, arguments are forwarded correctly, and the JSON response matches the fake payload) to verify correct wiring and serialization.
```suggestion
class SyncPython:
def run(self, code: str, **kwargs):
return {"output": "sync", "error": ""}
@pytest.mark.asyncio
async def test_cua_mouse_click_tool_happy_path_forwards_args_and_serializes_json(
monkeypatch,
):
import json
from astrbot.core.tools.computer_tools import cua as cua_tools
from astrbot.core.tools.computer_tools.cua import CuaMouseClickTool
class FakeEvent:
unified_msg_origin = "umo"
role = "admin"
class FakeAstrContext:
event = FakeEvent()
context = FakeContext(
{"provider_settings": {"computer_use_require_admin": True}}
)
class FakeWrapper:
context = FakeAstrContext()
class FakeGui:
def __init__(self):
self.clicked_args = None
async def click(self, x, y, button=None):
self.clicked_args = (x, y, button)
return {
"status": "ok",
"x": x,
"y": y,
"button": button,
}
fake_gui = FakeGui()
get_gui_called = {"value": False}
async def fake_get_gui_component(context):
get_gui_called["value"] = True
# Ensure the context passed into the helper is the wrapper's context
assert context is FakeWrapper().context # smoke check for wiring
return fake_gui
# Ensure tools use our fake GUI component
monkeypatch.setattr(cua_tools, "_get_gui_component", fake_get_gui_component)
tool = CuaMouseClickTool()
# Call the tool and validate behavior
result = await tool.call(FakeWrapper(), x=10, y=20, button="left")
assert get_gui_called["value"] is True
assert fake_gui.clicked_args == (10, 20, "left")
assert json.loads(result) == {
"status": "ok",
"x": 10,
"y": 20,
"button": "left",
}
@pytest.mark.asyncio
async def test_cua_keyboard_type_tool_happy_path_forwards_args_and_serializes_json(
monkeypatch,
):
import json
from astrbot.core.tools.computer_tools import cua as cua_tools
from astrbot.core.tools.computer_tools.cua import CuaKeyboardTypeTool
class FakeEvent:
unified_msg_origin = "umo"
role = "admin"
class FakeAstrContext:
event = FakeEvent()
context = FakeContext(
{"provider_settings": {"computer_use_require_admin": True}}
)
class FakeWrapper:
context = FakeAstrContext()
class FakeGui:
def __init__(self):
self.typed_text_args = None
async def type_text(self, text):
self.typed_text_args = (text,)
return {
"status": "ok",
"text": text,
}
fake_gui = FakeGui()
get_gui_called = {"value": False}
async def fake_get_gui_component(context):
get_gui_called["value"] = True
# Ensure the context passed into the helper is the wrapper's context
assert context is FakeWrapper().context # smoke check for wiring
return fake_gui
# Ensure tools use our fake GUI component
monkeypatch.setattr(cua_tools, "_get_gui_component", fake_get_gui_component)
tool = CuaKeyboardTypeTool()
# Call the tool and validate behavior
text = "Hello CUA"
result = await tool.call(FakeWrapper(), text=text)
assert get_gui_called["value"] is True
assert fake_gui.typed_text_args == (text,)
assert json.loads(result) == {
"status": "ok",
"text": text,
}
def _agent_computer_use_items():
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| class SyncPython: | ||
| def run(self, code: str, **kwargs): | ||
| return {"output": "sync", "error": ""} | ||
|
|
||
|
|
||
| def _agent_computer_use_items(): |
There was a problem hiding this comment.
suggestion (testing): Add happy-path tests for CUA mouse and keyboard tools to ensure correct wiring to the GUI component.
Current tests emphasize error handling, but none cover successful execution of CuaMouseClickTool or CuaKeyboardTypeTool through _get_gui_component to the underlying GUI component. Please add happy-path tests (e.g., injecting a fake GUI with click/type_text, asserting _get_gui_component is used, arguments are forwarded correctly, and the JSON response matches the fake payload) to verify correct wiring and serialization.
| class SyncPython: | |
| def run(self, code: str, **kwargs): | |
| return {"output": "sync", "error": ""} | |
| def _agent_computer_use_items(): | |
| class SyncPython: | |
| def run(self, code: str, **kwargs): | |
| return {"output": "sync", "error": ""} | |
| @pytest.mark.asyncio | |
| async def test_cua_mouse_click_tool_happy_path_forwards_args_and_serializes_json( | |
| monkeypatch, | |
| ): | |
| import json | |
| from astrbot.core.tools.computer_tools import cua as cua_tools | |
| from astrbot.core.tools.computer_tools.cua import CuaMouseClickTool | |
| class FakeEvent: | |
| unified_msg_origin = "umo" | |
| role = "admin" | |
| class FakeAstrContext: | |
| event = FakeEvent() | |
| context = FakeContext( | |
| {"provider_settings": {"computer_use_require_admin": True}} | |
| ) | |
| class FakeWrapper: | |
| context = FakeAstrContext() | |
| class FakeGui: | |
| def __init__(self): | |
| self.clicked_args = None | |
| async def click(self, x, y, button=None): | |
| self.clicked_args = (x, y, button) | |
| return { | |
| "status": "ok", | |
| "x": x, | |
| "y": y, | |
| "button": button, | |
| } | |
| fake_gui = FakeGui() | |
| get_gui_called = {"value": False} | |
| async def fake_get_gui_component(context): | |
| get_gui_called["value"] = True | |
| # Ensure the context passed into the helper is the wrapper's context | |
| assert context is FakeWrapper().context # smoke check for wiring | |
| return fake_gui | |
| # Ensure tools use our fake GUI component | |
| monkeypatch.setattr(cua_tools, "_get_gui_component", fake_get_gui_component) | |
| tool = CuaMouseClickTool() | |
| # Call the tool and validate behavior | |
| result = await tool.call(FakeWrapper(), x=10, y=20, button="left") | |
| assert get_gui_called["value"] is True | |
| assert fake_gui.clicked_args == (10, 20, "left") | |
| assert json.loads(result) == { | |
| "status": "ok", | |
| "x": 10, | |
| "y": 20, | |
| "button": "left", | |
| } | |
| @pytest.mark.asyncio | |
| async def test_cua_keyboard_type_tool_happy_path_forwards_args_and_serializes_json( | |
| monkeypatch, | |
| ): | |
| import json | |
| from astrbot.core.tools.computer_tools import cua as cua_tools | |
| from astrbot.core.tools.computer_tools.cua import CuaKeyboardTypeTool | |
| class FakeEvent: | |
| unified_msg_origin = "umo" | |
| role = "admin" | |
| class FakeAstrContext: | |
| event = FakeEvent() | |
| context = FakeContext( | |
| {"provider_settings": {"computer_use_require_admin": True}} | |
| ) | |
| class FakeWrapper: | |
| context = FakeAstrContext() | |
| class FakeGui: | |
| def __init__(self): | |
| self.typed_text_args = None | |
| async def type_text(self, text): | |
| self.typed_text_args = (text,) | |
| return { | |
| "status": "ok", | |
| "text": text, | |
| } | |
| fake_gui = FakeGui() | |
| get_gui_called = {"value": False} | |
| async def fake_get_gui_component(context): | |
| get_gui_called["value"] = True | |
| # Ensure the context passed into the helper is the wrapper's context | |
| assert context is FakeWrapper().context # smoke check for wiring | |
| return fake_gui | |
| # Ensure tools use our fake GUI component | |
| monkeypatch.setattr(cua_tools, "_get_gui_component", fake_get_gui_component) | |
| tool = CuaKeyboardTypeTool() | |
| # Call the tool and validate behavior | |
| text = "Hello CUA" | |
| result = await tool.call(FakeWrapper(), text=text) | |
| assert get_gui_called["value"] is True | |
| assert fake_gui.typed_text_args == (text,) | |
| assert json.loads(result) == { | |
| "status": "ok", | |
| "text": text, | |
| } | |
| def _agent_computer_use_items(): |
✅ Addressed in 89aad65: Happy-path tests have been added for the CUA mouse and keyboard tools, verifying forwarding to the GUI component and correct JSON responses.
There was a problem hiding this comment.
Hey - I've found 3 issues, and left some high level feedback:
- In
CuaBooter.download_file, the fallback path always shells out tobase64without checkingos_type, unlike the other filesystem fallbacks which guard on POSIX; consider reusing_is_posix_os_typeand returning a structured error for non-POSIX images to keep behavior consistent and avoid broken downloads on Windows-like images. - The CUA-related tests define many repeated
FakeContext/FakeEvent/FakeWrapper/FakeBooterscaffolds inline; extracting these into reusable fixtures or helper factories would make the tests easier to maintain and reduce duplication.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `CuaBooter.download_file`, the fallback path always shells out to `base64` without checking `os_type`, unlike the other filesystem fallbacks which guard on POSIX; consider reusing `_is_posix_os_type` and returning a structured error for non-POSIX images to keep behavior consistent and avoid broken downloads on Windows-like images.
- The CUA-related tests define many repeated `FakeContext`/`FakeEvent`/`FakeWrapper`/`FakeBooter` scaffolds inline; extracting these into reusable fixtures or helper factories would make the tests easier to maintain and reduce duplication.
## Individual Comments
### Comment 1
<location path="astrbot/core/computer/booters/cua.py" line_range="679-688" />
<code_context>
+ **result,
+ }
+
+ async def download_file(self, remote_path: str, local_path: str) -> None:
+ sandbox = None if self._runtime is None else self._runtime.sandbox
+ if sandbox is not None and hasattr(sandbox, "download_file"):
+ await sandbox.download_file(remote_path, local_path)
+ return
+ result = await self.shell.exec(f"base64 {shlex.quote(remote_path)}")
+ if result.get("stderr"):
+ raise RuntimeError(result["stderr"])
+ Path(local_path).parent.mkdir(parents=True, exist_ok=True)
+ Path(local_path).write_bytes(base64.b64decode(result.get("stdout", "")))
+
+ async def available(self) -> bool:
</code_context>
<issue_to_address>
**issue (bug_risk):** CUA download_file assumes POSIX `base64` availability even for non-POSIX images.
Unlike the other filesystem fallbacks, this method unconditionally shells out to `base64`, which will likely fail on Windows or other non-POSIX images and surface as a confusing runtime error. Please either guard this path with `_is_posix_os_type(self.os_type)` and return a `_non_posix_filesystem_result`-style error for non-POSIX, or invoke `python3 - <<` with a small base64 decode helper (and clear error messaging if `python3` is missing) so behavior is consistent across OS types.
</issue_to_address>
### Comment 2
<location path="tests/unit/test_func_tool_manager.py" line_range="52-61" />
<code_context>
+ return CONFIG_METADATA_3["ai_group"]["metadata"]["agent_computer_use"]["items"]
+
+
+@pytest.mark.asyncio
+async def test_get_booter_creates_cua_booter(monkeypatch):
+ from astrbot.core.computer import computer_client
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test case to ensure user-provided env dicts are not mutated by ExecuteShellTool
The existing test only covers the `env=None` case. Since `call` now does `env=dict(env or {})`, please add a test that passes a shared `env` (e.g. `{"FOO": "bar"}`) and asserts that the dict passed into `FakeShell.exec` includes both the original entries and the mutation (e.g. `MUTATED_BY_FAKE_SHELL`), while the original `env` object remains unchanged after the call. This guards against `ExecuteShellTool` mutating caller-owned env dicts.
</issue_to_address>
### Comment 3
<location path="astrbot/core/computer/booters/cua.py" line_range="465" />
<code_context>
+ }
+
+
+class CuaGUIComponent(GUIComponent):
+ def __init__(self, sandbox: Any) -> None:
+ self._sandbox = sandbox
</code_context>
<issue_to_address>
**issue (complexity):** Consider resolving and caching GUI component methods in CuaGUIComponent.__init__ so the click/type/press_key methods become straight-line calls instead of doing reflective lookups on every invocation.
You can reduce a noticeable layer of indirection in the GUI path by resolving component methods once in `__init__` instead of doing late-bound lookups on every call.
Today each GUI call goes through `_require_component_method` + `getattr` chains:
```python
async def click(self, x: int, y: int, button: str = "left") -> dict[str, Any]:
click = _require_component_method(self._sandbox, "mouse", "click")
result = await _maybe_await(click(x, y, button=button))
payload = _maybe_model_dump(result)
return {"success": bool(payload.get("success", True)), **payload}
```
That makes the call path harder to follow and repeats the same reflection logic.
You can keep the same functionality but fail-fast and simplify call sites by resolving and caching these methods in `CuaGUIComponent.__init__`:
```python
class CuaGUIComponent(GUIComponent):
def __init__(self, sandbox: Any) -> None:
self._sandbox = sandbox
mouse = getattr(sandbox, "mouse", None)
keyboard = getattr(sandbox, "keyboard", None)
self._click = getattr(mouse, "click", None) if mouse is not None else None
self._type_text = getattr(keyboard, "type", None) if keyboard is not None else None
# preserve the current fallback behaviour for press_key
self._press_key = None
if keyboard is not None:
for name in ("press", "key_press", "press_key"):
method = getattr(keyboard, name, None)
if method is not None:
self._press_key = method
break
```
Then the methods become straight-line and easier to reason about:
```python
async def click(self, x: int, y: int, button: str = "left") -> dict[str, Any]:
if self._click is None:
raise RuntimeError(
"CUA sandbox does not provide mouse.click; "
"please check the installed CUA SDK version and sandbox backend."
)
result = await _maybe_await(self._click(x, y, button=button))
payload = _maybe_model_dump(result)
return {"success": bool(payload.get("success", True)), **payload}
async def type_text(self, text: str) -> dict[str, Any]:
if self._type_text is None:
raise RuntimeError(
"CUA sandbox does not provide keyboard.type; "
"please check the installed CUA SDK version and sandbox backend."
)
result = await _maybe_await(self._type_text(text))
payload = _maybe_model_dump(result)
return {"success": bool(payload.get("success", True)), **payload}
async def press_key(self, key: str) -> dict[str, Any]:
if self._press_key is None:
raise RuntimeError(
"CUA sandbox does not provide keyboard.press/key_press/press_key; "
"please check the installed CUA SDK version and sandbox backend."
)
result = await _maybe_await(self._press_key(key))
payload = _maybe_model_dump(result)
return {"success": bool(payload.get("success", True)), **payload}
```
This keeps all existing behaviour (same fallbacks and error semantics) but:
- Removes repeated `_require_component_method` calls and method scans on every invocation.
- Makes each GUI operation’s implementation locally obvious.
- Aligns with the “resolve at boot/init time, not at each call” simplification direction highlighted by the other reviewer.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| @pytest.mark.asyncio | ||
| async def test_execute_shell_defaults_to_background(monkeypatch): | ||
| from astrbot.core.tools.computer_tools import shell as shell_tools | ||
|
|
||
| calls = [] | ||
|
|
||
| class FakeShell: | ||
| async def exec(self, command, cwd=None, background=False, env=None): | ||
| calls.append({"command": command, "background": background}) | ||
| return {"success": True, "stdout": "", "stderr": "", "exit_code": 0} |
There was a problem hiding this comment.
suggestion (testing): Add a test case to ensure user-provided env dicts are not mutated by ExecuteShellTool
The existing test only covers the env=None case. Since call now does env=dict(env or {}), please add a test that passes a shared env (e.g. {"FOO": "bar"}) and asserts that the dict passed into FakeShell.exec includes both the original entries and the mutation (e.g. MUTATED_BY_FAKE_SHELL), while the original env object remains unchanged after the call. This guards against ExecuteShellTool mutating caller-owned env dicts.
|
@sourcery-ai review |
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The changes in
AstrBotConfig.check_config_integritynow emit very generic log messages (e.g., no key/path in the "配置项不存在" / "未知配置项" logs), which may make config debugging harder; consider including a redacted or partial path (or a non-sensitive identifier) so that operators can still identify which setting was inserted/reordered without exposing secrets.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The changes in `AstrBotConfig.check_config_integrity` now emit very generic log messages (e.g., no key/path in the "配置项不存在" / "未知配置项" logs), which may make config debugging harder; consider including a redacted or partial path (or a non-sensitive identifier) so that operators can still identify which setting was inserted/reordered without exposing secrets.
## Individual Comments
### Comment 1
<location path="astrbot/core/computer/booters/cua.py" line_range="65" />
<code_context>
+ )
+
+
+@dataclass(slots=True)
+class ProcessResult:
+ stdout: str
</code_context>
<issue_to_address>
**issue (complexity):** Consider simplifying the CUA booter by standardizing per-component result shapes and resolving GUI methods explicitly instead of using generic normalization helpers and reflective utilities.
You can reduce a noticeable amount of indirection in this file by **standardizing the internal schema per component** and replacing some of the “generic normalization” helpers with small, explicit adapters.
Below are two focused refactors that keep all existing behavior but make the code easier to follow:
---
### 1. Narrow `ProcessResult` normalization to known CUA shapes
Right now `_normalize_process_result` + `_maybe_model_dump` + `_result_text` attempt to handle arbitrary objects/dicts/strings. If the CUA SDK only actually returns a small set of shapes (e.g. current SDK plus maybe one legacy variant), you can encode that explicitly instead of handling “any dict-like”.
That lets you delete `_maybe_model_dump` and `_result_text` entirely and make `_normalize_process_result` much more obvious:
```python
@dataclass(slots=True)
class ProcessResult:
stdout: str
stderr: str
exit_code: int | None
success: bool
def _normalize_process_result(raw: Any) -> ProcessResult:
# 1. Already in internal shape
if isinstance(raw, ProcessResult):
return raw
# 2. Known SDK shapes
# Example shape A: {"stdout": ..., "stderr": ..., "exit_code": ..., "success": ...}
if isinstance(raw, dict):
stdout = str(raw.get("stdout") or raw.get("output") or "")
stderr = str(raw.get("stderr") or raw.get("error") or "")
exit_code = raw.get("exit_code")
if exit_code is None:
exit_code = raw.get("returncode")
if exit_code is None:
exit_code = 0 if not stderr else 1
success = raw.get("success")
if success is None:
success = not stderr and exit_code in (0, None)
return ProcessResult(
stdout=stdout,
stderr=stderr,
exit_code=exit_code,
success=bool(success),
)
# 3. String fallback (CUA shell sometimes returns plain stdout)
if isinstance(raw, str):
return ProcessResult(stdout=raw, stderr="", exit_code=0, success=True)
# 4. Pydantic-style objects (only if you actually need to support this)
if hasattr(raw, "model_dump"):
return _normalize_process_result(raw.model_dump())
if hasattr(raw, "dict"):
return _normalize_process_result(raw.dict())
# 5. Last-resort fallback
return ProcessResult(stdout=str(raw), stderr="", exit_code=None, success=True)
```
Usage in `CuaShellComponent` and `CuaPythonComponent` stays the same:
```python
result = await _maybe_await(self._exec_raw(command, **kwargs))
proc = _normalize_process_result(result)
```
Benefits:
- All logic is in one place; you can see exactly which shapes are supported.
- You can easily add/remove branches when CUA SDK changes, instead of trying to handle arbitrary “dict-like” payloads.
- `_maybe_model_dump` and `_result_text` are no longer needed.
---
### 2. Simplify GUI component method resolution
`_resolve_component_method`, `_missing_component_method_error`, and `_has_component_method` add a lot of generic plumbing to support multiple method names. If the supported CUA SDK variants are known, you can resolve and freeze the exact methods once in `__init__`, and drop the generic helpers.
```python
class CuaGUIComponent(GUIComponent):
def __init__(self, sandbox: Any) -> None:
self._sandbox = sandbox
mouse = getattr(sandbox, "mouse", None)
keyboard = getattr(sandbox, "keyboard", None)
# Resolve once and keep explicit attributes
self._click = getattr(mouse, "click", None) if mouse else None
# Support a small, explicit set of variants
self._type_text = None
if keyboard is not None:
self._type_text = getattr(keyboard, "type", None)
if self._type_text is None:
self._type_text = getattr(keyboard, "type_text", None)
self._press_key = (
getattr(keyboard, "press", None)
or getattr(keyboard, "key_press", None)
or getattr(keyboard, "press_key", None)
)
else:
self._press_key = None
async def click(self, x: int, y: int, button: str = "left") -> dict[str, Any]:
if self._click is None:
raise RuntimeError("CUA sandbox does not provide mouse.click")
result = await _maybe_await(self._click(x, y, button=button))
payload = result if isinstance(result, dict) else {}
return {"success": bool(payload.get("success", True)), **payload}
async def type_text(self, text: str) -> dict[str, Any]:
if self._type_text is None:
raise RuntimeError("CUA sandbox does not provide keyboard.type/_type_text")
result = await _maybe_await(self._type_text(text))
payload = result if isinstance(result, dict) else {}
return {"success": bool(payload.get("success", True)), **payload}
async def press_key(self, key: str) -> dict[str, Any]:
if self._press_key is None:
raise RuntimeError(
"CUA sandbox does not provide keyboard.press/key_press/press_key"
)
result = await _maybe_await(self._press_key(key))
payload = result if isinstance(result, dict) else {}
return {"success": bool(payload.get("success", True)), **payload}
```
With this refactor:
- `_resolve_component_method`, `_missing_component_method_error`, and `_has_component_method` can be removed or inlined where strictly needed.
- The capabilities detection in `CuaBooter.capabilities` can directly inspect `self._runtime.gui._click` / `_type_text` instead of walking the sandbox again.
---
These two changes alone:
- Preserve your support for multiple CUA SDK layouts.
- Remove several generic helpers and reflective paths.
- Make the booter behavior easier to read and test because the accepted shapes and method variants are explicitly encoded instead of “any dict/model with some keys”.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| ) | ||
|
|
||
|
|
||
| @dataclass(slots=True) |
There was a problem hiding this comment.
issue (complexity): Consider simplifying the CUA booter by standardizing per-component result shapes and resolving GUI methods explicitly instead of using generic normalization helpers and reflective utilities.
You can reduce a noticeable amount of indirection in this file by standardizing the internal schema per component and replacing some of the “generic normalization” helpers with small, explicit adapters.
Below are two focused refactors that keep all existing behavior but make the code easier to follow:
1. Narrow ProcessResult normalization to known CUA shapes
Right now _normalize_process_result + _maybe_model_dump + _result_text attempt to handle arbitrary objects/dicts/strings. If the CUA SDK only actually returns a small set of shapes (e.g. current SDK plus maybe one legacy variant), you can encode that explicitly instead of handling “any dict-like”.
That lets you delete _maybe_model_dump and _result_text entirely and make _normalize_process_result much more obvious:
@dataclass(slots=True)
class ProcessResult:
stdout: str
stderr: str
exit_code: int | None
success: bool
def _normalize_process_result(raw: Any) -> ProcessResult:
# 1. Already in internal shape
if isinstance(raw, ProcessResult):
return raw
# 2. Known SDK shapes
# Example shape A: {"stdout": ..., "stderr": ..., "exit_code": ..., "success": ...}
if isinstance(raw, dict):
stdout = str(raw.get("stdout") or raw.get("output") or "")
stderr = str(raw.get("stderr") or raw.get("error") or "")
exit_code = raw.get("exit_code")
if exit_code is None:
exit_code = raw.get("returncode")
if exit_code is None:
exit_code = 0 if not stderr else 1
success = raw.get("success")
if success is None:
success = not stderr and exit_code in (0, None)
return ProcessResult(
stdout=stdout,
stderr=stderr,
exit_code=exit_code,
success=bool(success),
)
# 3. String fallback (CUA shell sometimes returns plain stdout)
if isinstance(raw, str):
return ProcessResult(stdout=raw, stderr="", exit_code=0, success=True)
# 4. Pydantic-style objects (only if you actually need to support this)
if hasattr(raw, "model_dump"):
return _normalize_process_result(raw.model_dump())
if hasattr(raw, "dict"):
return _normalize_process_result(raw.dict())
# 5. Last-resort fallback
return ProcessResult(stdout=str(raw), stderr="", exit_code=None, success=True)Usage in CuaShellComponent and CuaPythonComponent stays the same:
result = await _maybe_await(self._exec_raw(command, **kwargs))
proc = _normalize_process_result(result)Benefits:
- All logic is in one place; you can see exactly which shapes are supported.
- You can easily add/remove branches when CUA SDK changes, instead of trying to handle arbitrary “dict-like” payloads.
_maybe_model_dumpand_result_textare no longer needed.
2. Simplify GUI component method resolution
_resolve_component_method, _missing_component_method_error, and _has_component_method add a lot of generic plumbing to support multiple method names. If the supported CUA SDK variants are known, you can resolve and freeze the exact methods once in __init__, and drop the generic helpers.
class CuaGUIComponent(GUIComponent):
def __init__(self, sandbox: Any) -> None:
self._sandbox = sandbox
mouse = getattr(sandbox, "mouse", None)
keyboard = getattr(sandbox, "keyboard", None)
# Resolve once and keep explicit attributes
self._click = getattr(mouse, "click", None) if mouse else None
# Support a small, explicit set of variants
self._type_text = None
if keyboard is not None:
self._type_text = getattr(keyboard, "type", None)
if self._type_text is None:
self._type_text = getattr(keyboard, "type_text", None)
self._press_key = (
getattr(keyboard, "press", None)
or getattr(keyboard, "key_press", None)
or getattr(keyboard, "press_key", None)
)
else:
self._press_key = None
async def click(self, x: int, y: int, button: str = "left") -> dict[str, Any]:
if self._click is None:
raise RuntimeError("CUA sandbox does not provide mouse.click")
result = await _maybe_await(self._click(x, y, button=button))
payload = result if isinstance(result, dict) else {}
return {"success": bool(payload.get("success", True)), **payload}
async def type_text(self, text: str) -> dict[str, Any]:
if self._type_text is None:
raise RuntimeError("CUA sandbox does not provide keyboard.type/_type_text")
result = await _maybe_await(self._type_text(text))
payload = result if isinstance(result, dict) else {}
return {"success": bool(payload.get("success", True)), **payload}
async def press_key(self, key: str) -> dict[str, Any]:
if self._press_key is None:
raise RuntimeError(
"CUA sandbox does not provide keyboard.press/key_press/press_key"
)
result = await _maybe_await(self._press_key(key))
payload = result if isinstance(result, dict) else {}
return {"success": bool(payload.get("success", True)), **payload}With this refactor:
_resolve_component_method,_missing_component_method_error, and_has_component_methodcan be removed or inlined where strictly needed.- The capabilities detection in
CuaBooter.capabilitiescan directly inspectself._runtime.gui._click/_type_textinstead of walking the sandbox again.
These two changes alone:
- Preserve your support for multiple CUA SDK layouts.
- Remove several generic helpers and reflective paths.
- Make the booter behavior easier to read and test because the accepted shapes and method variants are explicitly encoded instead of “any dict/model with some keys”.
Summary
Test Plan
Notes
cuain the AstrBot environment.Summary by Sourcery
Add CUA-based sandbox computer-use support with GUI tooling, refine shell execution behavior, and harden configuration and logging around sandbox usage and defaults.
New Features:
Bug Fixes:
Enhancements:
Tests: