Codesota · Models · GPT-5OpenAI10 results · 9 benchmarks
Model card

GPT-5.

OpenAIproprietary
§ 02 · Benchmarks

Every benchmark GPT-5 has a recorded score for.

#BenchmarkArea · TaskMetricValueRankDateSource
01GSM8KReasoning · Mathematical Reasoningaccuracy99.2%#3/482025-08-01source ↗
02LiveCodeBenchComputer Code · Code Generationpass@185.0%#3/30source ↗
03LiveCodeBench ProComputer Code · Code Generationelo2176.00#3/10source ↗
04HumanEvalComputer Code · Code Generationpass@195.1%#4/422025-12-01source ↗
05GPQA DiamondReasoning · Multi-step Reasoningaccuracy89.0%#7/74source ↗
06MMLUReasoning · Commonsense Reasoningaccuracy90.8%#8/642025-09-01source ↗
07SWE-Bench VerifiedComputer Code · Code Generationresolve-rate74.9%#13/39source ↗
08SWE-bench VerifiedAgentic AI · SWE-benchresolve-rate74.9%#20/81source ↗
09HLEReasoning · Multi-step Reasoningaccuracy25.3%#22/74source ↗
10HLEReasoning · Multi-step Reasoningaccuracy25.3%#23/74unverified
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 03 · Strengths by area

Where GPT-5 actually performs.

Computer Code
4
benchmarks
avg rank #5.8
Reasoning
4
benchmarks
avg rank #12.6
Agentic AI
1
benchmark
avg rank #20.0
§ 05 · Related models

Other OpenAI models scored on Codesota.

GPT-4o
Undisclosed params · 38 results · 9 SOTA
o3
17 results · 5 SOTA
o4-mini
14 results · 2 SOTA
o3 (high)
2 results · 1 SOTA
Codex / GPT-5.5
1 result · 1 SOTA
Codex CLI (GPT-5.5)
1 result · 1 SOTA
o4-mini (high)
1 result · 1 SOTA
o1
12 results
§ 06 · Sources & freshness

Where these numbers come from.

editorial
2
results
gsm8k-shadow-page-timeline
1
result
artificial-analysis
1
result
livecodebench-pro-official
1
result
shadow-page-humaneval
1
result
openai-gpt-5-launch
1
result
codesota-shadow-mmlu
1
result
openai-blog
1
result
scale-hle-official
1
result
4 of 10 rows marked verified. · first result 2025-08-01, latest 2025-12-01.