Codesota · Tasks · Task agentsHome/Tasks/Agentic AI/Task agents

Task agents.

AI agents are autonomous software systems that use artificial intelligence to achieve goals and complete tasks on behalf of users, acting independently to perceive their environment, make decisions, and take actions without constant human intervention. They use advanced capabilities like reasoning, memory, planning, and learning, often leveraging large language models (LLMs) and other AI tools to interpret information and perform complex workflows across various industries.

Datasets

Results

acc-tau-0-33

Canonical metric

§ 02 · Canonical benchmark

The reference dataset.

Collider-Bench

Benchmark for autonomous coding/scientific agents reproducing Large Hadron Collider analyses. Public CodeSOTA score is Acc_tau at tau=0.33: the percent of simulation tasks whose relative-L2 error is below 0.33, derived from Table 2 and Eq. 4 of arXiv:2605.13950.

Primary metric: acc-tau-0-33

View full leaderboard →

§ 03 · Top 10

Leading models.

Leading models on Collider-Bench.

#	Model	acc-tau-0-33	Year	Source
★	Codex CLI (GPT-5.5)✓	30.0	2026	paper ↗
2	Claude Code (Opus 4.7)✓	20.0	2026	paper ↗
3	Claude Code (Sonnet 4.6)✓	10.0	2026	paper ↗
4	Claude Code (Haiku 4.5)✓	0.000	2026	paper ↗
5	Codex CLI (GPT-5.4-mini)✓	0.000	2026	paper ↗
6	ForgeCode (DeepSeek-V4)✓	0.000	2026	paper ↗