Codesota · Models · GPT-4OpenAI13 results · 6 benchmarks
Model card

GPT-4.

OpenAIproprietaryTransformer (LLM)

GPT-4 Technical Report. OpenAI, Mar 2023.

§ 02 · Benchmarks

Every benchmark GPT-4 has a recorded score for.

#BenchmarkArea · TaskMetricValueRankDateSource
01WMT'23Natural Language Processing · Machine Translationcomet84.1%#1/4source ↗
02WikiTableQuestionsNatural Language Processing · Table Question Answeringaccuracy75.3%#1/3source ↗
03XNLINatural Language Processing · Zero-Shot Classificationaccuracy87.4%#1/3source ↗
04SWE-benchComputer Code · Code Generationresolve-rate-agentic12.5%#24/252024-03-01unverified
05GSM8KReasoning · Mathematical Reasoningaccuracy92.0%#31/482023-03-01source ↗
06GSM8KReasoning · Mathematical Reasoningaccuracy92.0%#31/482023-03-01source ↗
07PLCCNatural Language Processing · Polish Cultural Competencyculture-and-tradition63.0%#81/165source ↗
08PLCCNatural Language Processing · Polish Cultural Competencygrammar58.0%#86/165source ↗
09PLCCNatural Language Processing · Polish Cultural Competencyhistory72.0%#90/165source ↗
10PLCCNatural Language Processing · Polish Cultural Competencyart-and-entertainment49.0%#91/165source ↗
11PLCCNatural Language Processing · Polish Cultural Competencyaverage59.5%#91/165source ↗
12PLCCNatural Language Processing · Polish Cultural Competencyvocabulary48.0%#95/165source ↗
13PLCCNatural Language Processing · Polish Cultural Competencygeography67.0%#99/165source ↗
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 03 · Strengths by area

Where GPT-4 actually performs.

Natural Language Processing
1
benchmark
avg rank #1.0
Computer Code
1
benchmark
avg rank #24.0
Reasoning
1
benchmark
avg rank #31.0
Natural Language Processing
3
benchmarks
avg rank #70.6
§ 05 · Related models

Other OpenAI models scored on Codesota.

GPT-4o
Undisclosed params · 38 results · 9 SOTA
o3
17 results · 5 SOTA
o4-mini
14 results · 2 SOTA
o3 (high)
2 results · 1 SOTA
Codex / GPT-5.5
1 result · 1 SOTA
Codex CLI (GPT-5.5)
1 result · 1 SOTA
o4-mini (high)
1 result · 1 SOTA
o1
12 results
§ 06 · Sources & freshness

Where these numbers come from.

sdadas/PLCC
7
results
arxiv
3
results
swe-agent
1
result
gsm8k-shadow-page-timeline
1
result
gsm8k-shadow-page
1
result
10 of 13 rows marked verified. · first result 2023-03-01, latest 2024-03-01.