Skip to content

Add ClawBench (arXiv:2604.08523) to Benchmark#28

Open
reacher-z wants to merge 1 commit into
trycua:mainfrom
reacher-z:add-clawbench
Open

Add ClawBench (arXiv:2604.08523) to Benchmark#28
reacher-z wants to merge 1 commit into
trycua:mainfrom
reacher-z:add-clawbench

Conversation

@reacher-z

Copy link
Copy Markdown

Adds ClawBench to the Benchmark section.

ClawBench evaluates browser/computer-use agents on live production websites (Uber Eats, Indeed, Craigslist, etc.) with two-stage scoring: deterministic HTTP-request interception + LLM judge on the intercepted payload.

Sits next to OSWorld / AndroidWorld / AppWorld / WebVoyager / Spider2-V (already listed).

Affiliation: I'm one of the maintainers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant