AI agents are autonomous software systems that use artificial intelligence to achieve goals and complete tasks on behalf of users, acting independently to perceive their environment, make decisions, and take actions without constant human intervention. They use advanced capabilities like reasoning, memory, planning, and learning, often leveraging large language models (LLMs) and other AI tools to interpret information and perform complex workflows across various industries.
Benchmark for autonomous coding/scientific agents reproducing Large Hadron Collider analyses. Public CodeSOTA score is Acc_tau at tau=0.33: the percent of simulation tasks whose relative-L2 error is below 0.33, derived from Table 2 and Eq. 4 of arXiv:2605.13950.
Leading models on Collider-Bench.
Didn't find the model, metric, or dataset you needed? Tell us in one line. We read every message and reply within 48 hours.
9 datasets tracked for this task.
Still looking for something on Task agents? A missing model, a stale score, a benchmark we should cover — drop it here and we'll handle it.
Real humans read every message. We track what people are asking for and prioritize accordingly.