Codesota · Models · GPT-4 Turbo (2024)OpenAI5 results · 5 benchmarks
Model card

GPT-4 Turbo (2024).

OpenAIproprietaryUnknown paramsGPT-4 Turbo (gpt-4-turbo-2024-04-09)

GPT-4 Turbo evaluated on METR autonomy tasks and HCAST.

§ 01 · Benchmarks

Every benchmark GPT-4 Turbo (2024) has a recorded score for.

#BenchmarkArea · TaskMetricValueRankDateSource
01METR Time HorizonAgentic AI · Time Horizontask-horizon-minutes2.0%#5/52025-04-01source ↗
02RE-BenchAgentic AI · RE-Benchnormalized-score0.1%#5/52024-11-22source ↗
03HCASTAgentic AI · HCASTsuccess-rate12.0%#6/62023-12-19source ↗
04WebArenaAgentic AI · Web & Desktop Agentssuccess-rate14.9%#6/62023-07-26source ↗
05OSWorldAgentic AI · Web & Desktop Agentssuccess-rate6.5%#13/132024-04-11source ↗
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 02 · Strengths by area

Where GPT-4 Turbo (2024) actually performs.

Agentic AI
5
benchmarks
avg rank #7.0
§ 03 · Papers

5 papers with results for GPT-4 Turbo (2024).

  1. 2025-04-01· Agentic AI· 1 result

    METR: Measuring Autonomy in AI Systems (2025 Update)

  2. 2024-11-22· Agentic AI· 1 result

    RE-Bench: Evaluating Frontier AI R&D Capabilities of Language Model Agents Against Human Experts

  3. 2024-04-11· Agentic AI· 1 result

    OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

  4. 2023-12-19· Agentic AI· 1 result

    HCAST: Human-Calibrated Autonomy Software Tasks

  5. 2023-07-26· Agentic AI· 1 result

    WebArena: A Realistic Web Environment for Building Autonomous Agents

§ 04 · Related models

Other OpenAI models scored on Codesota.

GPT-4o
Undisclosed params · 35 results · 9 SOTA
o3
16 results · 5 SOTA
o4-mini
13 results · 3 SOTA
o3 (high)
2 results · 1 SOTA
o4-mini (high)
1 result · 1 SOTA
o1
11 results
GPT-5
8 results
o1-preview
Undisclosed params · 8 results
§ 05 · Sources & freshness

Where these numbers come from.

arxiv
3
results
official-leaderboard
2
results
5 of 5 rows marked verified. · first result 2023-07-26, latest 2025-04-01.