| 01 | Agent S3 w/ bBoN | unverified | 63.5 | 2025 | Paper ↗Code ↗ | Looks wrong? |
| 02 | GLM-5V-Turbo | unverified | 62.3 | 2026 | Paper ↗Code ↗ | Looks wrong? |
| 03 | CoAct-1 CoAct-1 on OSWorld, 60.76% success rate (SOTA as of Feb 2026). Salesforce, arxiv:2508.03923, Aug 2025. Combines GUI and programmatic code execution. | verified | 60.76 | 2026 | Source ↗ | Looks wrong? |
| 04 | JEDI-7B with o3 planner | unverified | 51 | 2025 | Paper ↗Code ↗ | Looks wrong? |
| 05 | UI-TARS-2 UI-TARS-2 on OSWorld, 47.5% success rate. ByteDance Seed, arxiv:2509.02544, Sep 2025. Multi-turn RL trained. | verified | 47.5 | 2026 | Source ↗ | Looks wrong? |
| 06 | GTA1 (7B) GTA1 (7B) on OSWorld, 45.2% success rate. Salesforce AI Research, arxiv:2507.05791, Jul 2025. ICLR 2026 paper. | verified | 45.2 | 2026 | Source ↗ | Looks wrong? |
| 07 | UI-TARS-1.5 UI-TARS-1.5 on OSWorld, 42.5% success rate (100 steps). ByteDance, released Apr 2025. | verified | 42.5 | 2026 | Source ↗ | Looks wrong? |
| 08 | Agent S2 (Gemini 2.5) Agent S2 with Gemini 2.5 on OSWorld, 41.4% (50 steps). From OSWorld-Human paper, arxiv:2506.16042, Jun 2025. | verified | 41.4 | 2026 | Source ↗ | Looks wrong? |
| 09 | Holo2-8B | unverified | 39.9 | 2026 | Paper ↗ | Looks wrong? |
| 10 | Qwen3-VL-235B-A22B-Thinking | unverified | 38.1 | 2025 | Paper ↗Code ↗ | Looks wrong? |
| 11 | OpenAI CUA (o1) OpenAI Computer-Using Agent (CUA/Operator) on OSWorld, 38.1% success rate. Announced Jan 2025. | verified | 38.1 | 2026 | Source ↗ | Looks wrong? |
| 12 | Holo2-4B | unverified | 37.7 | 2026 | Paper ↗ | Looks wrong? |
| 13 | Holo2-30B-A3B | unverified | 37.4 | 2026 | Paper ↗ | Looks wrong? |
| 14 | Agent S2 (Claude 3.7) Agent S2 with Claude 3.7 Sonnet on OSWorld, 34.5% (50 steps). Simular AI, arxiv:2504.00906, Apr 2025. | verified | 34.5 | 2026 | Source ↗ | Looks wrong? |
| 15 | Agent S2 w/ Claude-3.7-Sonnet | unverified | 34.5 | 2025 | Paper ↗Code ↗ | Looks wrong? |
| 16 | Qwen3-VL-8B-Instruct | unverified | 33.9 | 2025 | Paper ↗Code ↗ | Looks wrong? |
| 17 | Agent S2 w/ Claude-3.5-Sonnet | unverified | 33.7 | 2025 | Paper ↗Code ↗ | Looks wrong? |
| 18 | Qwen3-VL-235B-A22B-Instruct | unverified | 31.6 | 2025 | Paper ↗Code ↗ | Looks wrong? |
| 19 | Claude 3.7 Sonnet Claude 3.7 Sonnet on OSWorld, top of leaderboard at release (Feb 2025), 100 steps. From OSWorld-Human paper (arxiv:2506.16042). | verified | 28 | 2026 | Source ↗ | Looks wrong? |
| 20 | UI-TARS-72B UI-TARS-72B on OSWorld, 24.6% success rate (50 steps). ByteDance, arxiv:2501.12326, Jan 2025. | verified | 24.6 | 2026 | Source ↗ | Looks wrong? |
| 21 | Claude Computer Use Claude 3.5 Sonnet computer use with extended steps on OSWorld. Anthropic announcement Oct 2024. | verified | 22 | 2026 | Source ↗ | Looks wrong? |
| 22 | Agent S w/ GPT-4o | unverified | 20.58 | 2024 | Paper ↗Code ↗ | Looks wrong? |
| 23 | Agent S w/ Claude-3.5 | unverified | 20.48 | 2024 | Paper ↗Code ↗ | Looks wrong? |
| 24 | UFO (GPT-4V) UFO GPT-4V (Windows-focused). Evaluated on OSWorld subset. arxiv:2402.07939. | verified | 9.40 | 2024 | Paper ↗Source ↗ | Looks wrong? |
| 25 | Qwen2.5-VL-72B | unverified | 8.83 | 2025 | Paper ↗Code ↗ | Looks wrong? |
| 26 | Kimi-VL-A3B-Instruct | unverified | 8.22 | 2025 | Paper ↗Code ↗ | Looks wrong? |
| 27 | GPT-4 Turbo (2024) GPT-4V (screenshot-only) on OSWorld. Table 3, arxiv:2404.07972. Screenshot-based GUI agent. | verified | 6.50 | 2024 | Paper ↗ | Looks wrong? |