Codesota · Models · GPT-5.4OpenAI11 results · 3 benchmarks
Model card

GPT-5.4.

OpenAIapi

Imported from https://raw.githubusercontent.com/GAIR-NLP/AcademiClaw/main/README.md

§ 02 · Benchmarks

Every benchmark GPT-5.4 has a recorded score for.

#BenchmarkArea · TaskMetricValueRankDateSource
01AcademiClawAgentic AI · Task agentssafety-score87.5%#2/62026-05-04source ↗
02React Native EvalsMobile Development · React Native Code Generationnavigation-satisfaction95.6%#2/10source ↗
03React Native EvalsMobile Development · React Native Code Generationasync-state-satisfaction85.4%#2/10source ↗
04AcademiClawAgentic AI · Task agentsavg-score65.6%#3/62026-05-04source ↗
05React Native EvalsMobile Development · React Native Code Generationrequirement-satisfaction82.6%#3/10source ↗
06React Native EvalsMobile Development · React Native Code Generationanimation-satisfaction68.9%#3/10source ↗
07AcademiClawAgentic AI · Task agentspass42.5%#4/62026-05-04source ↗
08AcademiClawAgentic AI · Task agentsavg-time-sec240.00#5/52026-05-04source ↗
09AcademiClawAgentic AI · Task agentsavg-tokens-per-task-k525.00#6/62026-05-04source ↗
10AcademiClawAgentic AI · Task agentstool-calls-per-task19.0%#6/62026-05-04source ↗
11HLEReasoning · Multi-step Reasoningaccuracy36.2%#9/74source ↗
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 03 · Strengths by area

Where GPT-5.4 actually performs.

Mobile Development
1
benchmark
avg rank #2.5
Agentic AI
1
benchmark
avg rank #4.3
Reasoning
1
benchmark
avg rank #9.0
§ 04 · Papers

1 paper with results for GPT-5.4.

  1. 2026-05-04· Agentic AI· 6 results

    AcademiClaw: When Students Set Challenges for AI Agents

    Junjie Yu, Pengrui Lu, Weiye Si, Hongliang Lu et al.
§ 05 · Related models

Other OpenAI models scored on Codesota.

GPT-4o
Undisclosed params · 38 results · 9 SOTA
o3
17 results · 5 SOTA
o4-mini
14 results · 2 SOTA
o3 (high)
2 results · 1 SOTA
Codex / GPT-5.5
1 result · 1 SOTA
Codex CLI (GPT-5.5)
1 result · 1 SOTA
o4-mini (high)
1 result · 1 SOTA
o1
12 results
§ 06 · Sources & freshness

Where these numbers come from.

paper
6
results
Callstack Incubator
4
results
scale-hle-official
1
result
11 of 11 rows marked verified.