Codesota · Benchmark · Terminal-Bench 2.0Home/Leaderboards/Agentic AI/Autonomous Coding/Terminal-Bench 2.0
Unknown

Terminal-Bench 2.0.

Stanford x Laude benchmark for AI agents operating in terminal environments. Terminal-Bench 2.0 evaluates terminal mastery across software engineering, machine learning, security, data science, system administration, file operations, and related operational workflows. Official site lists 89 high-quality tasks and a 124-entry live leaderboard.

Paper Leaderboard Lineage
§ 01 · SOTA history

Year over year.

Not enough data to show trend.
§ 02 · Leaderboard

Results by metric.

accuracy

accuracy

Higher is better

Trust tiers for accuracyverifiedpapervendorcommunityunverified
RankModelTrustScoreYearSource
01Codex / GPT-5.5
Official Terminal-Bench 2.0 leaderboard rank 1. System couples agent scaffold and underlying model: Codex / GPT-5.5.
verified822026Source ↗
02Codex / GPT-5.5
Official Terminal-Bench 2.0 leaderboard rank 1. System couples agent scaffold and underlying model: Codex / GPT-5.5.
verified822026Source ↗
03ForgeCode / GPT-5.4
Official Terminal-Bench 2.0 leaderboard rank 2. System couples agent scaffold and underlying model: ForgeCode / GPT-5.4.
verified81.82026Source ↗
04ForgeCode / GPT-5.4
Official Terminal-Bench 2.0 leaderboard rank 2. System couples agent scaffold and underlying model: ForgeCode / GPT-5.4.
verified81.82026Source ↗
05TongAgents / Gemini 3.1 Pro
Official Terminal-Bench 2.0 leaderboard rank 3. System couples agent scaffold and underlying model: TongAgents / Gemini 3.1 Pro.
verified80.22026Source ↗
06TongAgents / Gemini 3.1 Pro
Official Terminal-Bench 2.0 leaderboard rank 3. System couples agent scaffold and underlying model: TongAgents / Gemini 3.1 Pro.
verified80.22026Source ↗
07ForgeCode / Claude Opus 4.6
Official Terminal-Bench 2.0 leaderboard rank 4. System couples agent scaffold and underlying model: ForgeCode / Claude Opus 4.6.
verified79.82026Source ↗
08ForgeCode / Claude Opus 4.6
Official Terminal-Bench 2.0 leaderboard rank 4. System couples agent scaffold and underlying model: ForgeCode / Claude Opus 4.6.
verified79.82026Source ↗
09ForgeCode / Gemini 3.1 Pro
Official Terminal-Bench 2.0 leaderboard rank 6. System couples agent scaffold and underlying model: ForgeCode / Gemini 3.1 Pro.
verified78.42026Source ↗
10SageAgent / GPT-5.3-Codex
Official Terminal-Bench 2.0 leaderboard rank 5. System couples agent scaffold and underlying model: SageAgent / GPT-5.3-Codex.
verified78.42026Source ↗
11SageAgent / GPT-5.3-Codex
Official Terminal-Bench 2.0 leaderboard rank 5. System couples agent scaffold and underlying model: SageAgent / GPT-5.3-Codex.
verified78.42026Source ↗
12ForgeCode / Gemini 3.1 Pro
Official Terminal-Bench 2.0 leaderboard rank 6. System couples agent scaffold and underlying model: ForgeCode / Gemini 3.1 Pro.
verified78.42026Source ↗
13Droid / GPT-5.3-Codex
Official Terminal-Bench 2.0 leaderboard rank 7. System couples agent scaffold and underlying model: Droid / GPT-5.3-Codex.
verified77.32026Source ↗
14Droid / GPT-5.3-Codex
Official Terminal-Bench 2.0 leaderboard rank 7. System couples agent scaffold and underlying model: Droid / GPT-5.3-Codex.
verified77.32026Source ↗
15Capy / Claude Opus 4.6
Official Terminal-Bench 2.0 leaderboard rank 8. System couples agent scaffold and underlying model: Capy / Claude Opus 4.6.
verified75.32026Source ↗
16Capy / Claude Opus 4.6
Official Terminal-Bench 2.0 leaderboard rank 8. System couples agent scaffold and underlying model: Capy / Claude Opus 4.6.
verified75.32026Source ↗
17Simple Codex / GPT-5.3-Codex
Official Terminal-Bench 2.0 leaderboard rank 9. System couples agent scaffold and underlying model: Simple Codex / GPT-5.3-Codex.
verified75.12026Source ↗
18Simple Codex / GPT-5.3-Codex
Official Terminal-Bench 2.0 leaderboard rank 9. System couples agent scaffold and underlying model: Simple Codex / GPT-5.3-Codex.
verified75.12026Source ↗
19Terminus-KIRA / Gemini 3.1 Pro
Official Terminal-Bench 2.0 leaderboard rank 10. System couples agent scaffold and underlying model: Terminus-KIRA / Gemini 3.1 Pro.
verified74.82026Source ↗
20Terminus-KIRA / Gemini 3.1 Pro
Official Terminal-Bench 2.0 leaderboard rank 10. System couples agent scaffold and underlying model: Terminus-KIRA / Gemini 3.1 Pro.
verified74.82026Source ↗
21Terminus-KIRA / Claude Opus 4.6
Official Terminal-Bench 2.0 leaderboard rank 11. System couples agent scaffold and underlying model: Terminus-KIRA / Claude Opus 4.6.
verified74.72026Source ↗
22Terminus-KIRA / Claude Opus 4.6
Official Terminal-Bench 2.0 leaderboard rank 11. System couples agent scaffold and underlying model: Terminus-KIRA / Claude Opus 4.6.
verified74.72026Source ↗
23Mux / GPT-5.3-Codex
Official Terminal-Bench 2.0 leaderboard rank 12. System couples agent scaffold and underlying model: Mux / GPT-5.3-Codex.
verified74.62026Source ↗
24Mux / GPT-5.3-Codex
Official Terminal-Bench 2.0 leaderboard rank 12. System couples agent scaffold and underlying model: Mux / GPT-5.3-Codex.
verified74.62026Source ↗
25MAYA-V2 / Claude 4.6 Opus
Official Terminal-Bench 2.0 leaderboard rank 13. System couples agent scaffold and underlying model: MAYA-V2 / Claude 4.6 Opus.
verified72.12026Source ↗
26MAYA-V2 / Claude 4.6 Opus
Official Terminal-Bench 2.0 leaderboard rank 13. System couples agent scaffold and underlying model: MAYA-V2 / Claude 4.6 Opus.
verified72.12026Source ↗
27TongAgents / Claude Opus 4.6
Official Terminal-Bench 2.0 leaderboard rank 14. System couples agent scaffold and underlying model: TongAgents / Claude Opus 4.6.
verified71.92026Source ↗
28TongAgents / Claude Opus 4.6
Official Terminal-Bench 2.0 leaderboard rank 14. System couples agent scaffold and underlying model: TongAgents / Claude Opus 4.6.
verified71.92026Source ↗
29Junie CLI / Multiple
Official Terminal-Bench 2.0 leaderboard rank 15. System couples agent scaffold and underlying model: Junie CLI / Multiple.
verified712026Source ↗
30Junie CLI / Multiple
Official Terminal-Bench 2.0 leaderboard rank 15. System couples agent scaffold and underlying model: Junie CLI / Multiple.
verified712026Source ↗
31CodeBrain-1 / GPT-5.3-Codex
Official Terminal-Bench 2.0 leaderboard rank 16. System couples agent scaffold and underlying model: CodeBrain-1 / GPT-5.3-Codex.
verified70.32026Source ↗
32CodeBrain-1 / GPT-5.3-Codex
Official Terminal-Bench 2.0 leaderboard rank 16. System couples agent scaffold and underlying model: CodeBrain-1 / GPT-5.3-Codex.
verified70.32026Source ↗
33Droid / Claude Opus 4.6
Official Terminal-Bench 2.0 leaderboard rank 17. System couples agent scaffold and underlying model: Droid / Claude Opus 4.6.
verified69.92026Source ↗
34Droid / Claude Opus 4.6
Official Terminal-Bench 2.0 leaderboard rank 17. System couples agent scaffold and underlying model: Droid / Claude Opus 4.6.
verified69.92026Source ↗
35Ante / Gemini 3 Pro
Official Terminal-Bench 2.0 leaderboard rank 18. System couples agent scaffold and underlying model: Ante / Gemini 3 Pro.
verified69.42026Source ↗
36Ante / Gemini 3 Pro
Official Terminal-Bench 2.0 leaderboard rank 18. System couples agent scaffold and underlying model: Ante / Gemini 3 Pro.
verified69.42026Source ↗
37IndusAGI Coding Agent / GPT-5.3-Codex
Official Terminal-Bench 2.0 leaderboard rank 19. System couples agent scaffold and underlying model: IndusAGI Coding Agent / GPT-5.3-Codex.
verified69.12026Source ↗
38IndusAGI Coding Agent / GPT-5.3-Codex
Official Terminal-Bench 2.0 leaderboard rank 19. System couples agent scaffold and underlying model: IndusAGI Coding Agent / GPT-5.3-Codex.
verified69.12026Source ↗
39Crux / Claude Opus 4.6
Official Terminal-Bench 2.0 leaderboard rank 20. System couples agent scaffold and underlying model: Crux / Claude Opus 4.6.
verified66.92026Source ↗
40Crux / Claude Opus 4.6
Official Terminal-Bench 2.0 leaderboard rank 20. System couples agent scaffold and underlying model: Crux / Claude Opus 4.6.
verified66.92026Source ↗
Lineage

Terminal-Bench 2.0 in context.

See full agentic ai benchmarks lineage →
This benchmark (1)
active2026-04
Terminal-Bench 2.0
None yet — this is the current frontier.
§ 04 · Submit a result

Add to the leaderboard.

← Back to Autonomous Coding