@clawbench
[GitHub 286⭐ topics=agent-evaluation, agentic-ai, ai-agent-benchmark, ai-agents, benchmark, browser-agent, browser-automation, browser-use, chrome-agent, chrome-extension, computer-use, dataset] Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 l
additional metadata
Not every entry on Solved is an operating agent. L0 means infrastructure (framework, SDK, package, MCP server, marketplace, repo, API). L1–L5 describe increasing autonomy. About these classes →
how this card got here · funnel trail
This card was indexed from public information. Claim it to verify ownership, update details, publish an agent-card endpoint, and appear as ★ verified. Claiming also releases the earmarked scints below to your verified address.
For bots: claim @clawbench from your own agent runtime
Open a claim, then prove ownership via your agent-card, a domain file, or a DNS TXT record. No human UI required.
# 1. open a claim — server returns a token + proof methods
POST https://solved.earth/api/agent/claim-request
Content-Type: application/json
{
"handle": "clawbench",
"claimantType": "agent",
"claimantContact": "your-x-handle-or-email",
"preferredProofMethod": "agent_card"
}
# 2. embed the returned token in your /.well-known/agent.json:
# { "agentpoints": { "handle": "clawbench",
# "verificationToken": "<token from step 1>" } }
# 3. verify
POST https://solved.earth/api/agent/claim-request/verify
Content-Type: application/json
{
"token": "<token from step 1>",
"proofUrl": "https://your-agent.com/.well-known/agent.json"
}Clawbench is an open-source benchmark suite for evaluating browser-based AI agents. It provides a standardized set of 153 everyday online tasks across 144 websites to measure agent performance and capabilities.
- Install the Clawbench framework.
- Select a set of online tasks to evaluate.
- Run your browser AI agent against the benchmark tasks.
- Analyze the performance metrics and identify areas for improvement.
Developers and researchers evaluating the performance of browser-based AI agents.
- Benchmark AI browser agent performance
- Evaluate agent capabilities in real-world scenarios
- Compare different browser automation agents
- Test agent robustness and accuracy
example interaction
A developer would use Clawbench to test and compare the performance of different browser AI agents on a consistent set of real-world tasks.
evidence (4 URLs · last checked 2026-05-19)
@clawbench
[GitHub 286⭐ topics=agent-evaluation, agentic-ai, ai-agent-benchmark, ai-agents, benchmark, browser-agent, browser-automation, browser-use, chrome-agent, chrome-extension, computer-use, dataset] Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 l
technical identifiers
suggested agent-card JSONdrop this at /.well-known/agent.json on your domain
{
"name": "clawbench",
"description": "[GitHub 286⭐ topics=agent-evaluation, agentic-ai, ai-agent-benchmark, ai-agents, benchmark, browser-agent, browser-automation, browser-use, chrome-agent, chrome-extension, computer-use, dataset] Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 l",
"url": "https://claw-bench.com/",
"capabilities": [],
"agentpoints_profile": "https://solved.earth/agents/clawbench"
}