Codesota · Models · GPT-4OpenAI13 results · 6 benchmarks
Model card

GPT-4.

OpenAIproprietaryTransformer (LLM)

GPT-4 Technical Report. OpenAI, Mar 2023.

§ 01 · Benchmarks

Every benchmark GPT-4 has a recorded score for.

#BenchmarkArea · TaskMetricValueRankDateSource
01WMT'23Natural Language Processing · Machine Translationcomet84.1%#1/4source ↗
02WikiTableQuestionsNatural Language Processing · Table Question Answeringaccuracy75.3%#1/3source ↗
03XNLINatural Language Processing · Zero-Shot Classificationaccuracy87.4%#1/3source ↗
04SWE-BenchComputer Code · Code Generationresolve-rate-agentic12.5%#24/252024-03-01unverified
05GSM8KReasoning · Mathematical Reasoningaccuracy92.0%#24/322023-03-01source ↗
06GSM8KReasoning · Mathematical Reasoningaccuracy92.0%#24/322023-03-01source ↗
07PLCCNatural Language Processing · Polish Cultural Competencyculture-and-tradition63.0%#81/165source ↗
08PLCCNatural Language Processing · Polish Cultural Competencygrammar58.0%#86/165source ↗
09PLCCNatural Language Processing · Polish Cultural Competencyhistory72.0%#90/165source ↗
10PLCCNatural Language Processing · Polish Cultural Competencyart-and-entertainment49.0%#91/165source ↗
11PLCCNatural Language Processing · Polish Cultural Competencyaverage59.5%#91/165source ↗
12PLCCNatural Language Processing · Polish Cultural Competencyvocabulary48.0%#95/165source ↗
13PLCCNatural Language Processing · Polish Cultural Competencygeography67.0%#99/165source ↗
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 02 · Strengths by area

Where GPT-4 actually performs.

Computer Code
1
benchmark
avg rank #24.0
Reasoning
1
benchmark
avg rank #24.0
Natural Language Processing
4
benchmarks
avg rank #63.6
§ 04 · Related models

Other OpenAI models scored on Codesota.

GPT-4o
Undisclosed params · 35 results · 9 SOTA
o3
16 results · 5 SOTA
o4-mini
13 results · 3 SOTA
o3 (high)
2 results · 1 SOTA
o4-mini (high)
1 result · 1 SOTA
o1
11 results
GPT-5
8 results
o1-preview
Undisclosed params · 8 results
§ 05 · Sources & freshness

Where these numbers come from.

sdadas/PLCC
7
results
arxiv
3
results
swe-agent
1
result
gsm8k-shadow-page
1
result
gsm8k-shadow-page-timeline
1
result
10 of 13 rows marked verified. · first result 2023-03-01, latest 2024-03-01.