Codesota · Models · Claude 3 OpusAnthropic14 results · 8 benchmarks
Model card

Claude 3 Opus.

Anthropicapi

Most capable Claude 3 model, March 2024. Supports image input. Source: Anthropic Claude 3 family announcement.

§ 02 · Benchmarks

Every benchmark Claude 3 Opus has a recorded score for.

#BenchmarkArea · TaskMetricValueRankDateSource
01BIG-Bench HardReasoning · Multi-step Reasoningaccuracy86.8%#6/11source ↗
02MMMUMultimodal · Visual Question Answeringaccuracy59.4%#23/302024-03-04source ↗
03GSM8KReasoning · Mathematical Reasoningaccuracy95.0%#24/482024-03-01source ↗
04HumanEvalComputer Code · Code Generationpass@184.9%#31/42source ↗
05PLCCNatural Language Processing · Polish Cultural Competencyart-and-entertainment73.0%#31/165source ↗
06MMLUReasoning · Commonsense Reasoningaccuracy86.8%#36/64source ↗
07PLCCNatural Language Processing · Polish Cultural Competencyhistory86.0%#36/165source ↗
08MATHReasoning · Mathematical Reasoningaccuracy60.1%#40/46source ↗
09PLCCNatural Language Processing · Polish Cultural Competencyculture-and-tradition76.0%#46/165source ↗
10PLCCNatural Language Processing · Polish Cultural Competencyaverage73.8%#49/165source ↗
11PLCCNatural Language Processing · Polish Cultural Competencygeography80.0%#58/165source ↗
12PLCCNatural Language Processing · Polish Cultural Competencyvocabulary62.0%#60/165source ↗
13PLCCNatural Language Processing · Polish Cultural Competencygrammar66.0%#61/165source ↗
14GPQA DiamondReasoning · Multi-step Reasoningaccuracy50.4%#63/74source ↗
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 03 · Strengths by area

Where Claude 3 Opus actually performs.

Multimodal
1
benchmark
avg rank #23.0
Computer Code
1
benchmark
avg rank #31.0
Reasoning
5
benchmarks
avg rank #33.8
Natural Language Processing
1
benchmark
avg rank #48.7
§ 04 · Papers

1 paper with results for Claude 3 Opus.

  1. 2024-03-04· Multimodal· 1 result

    Claude 3 Model Family (Haiku, Sonnet, Opus)

§ 05 · Related models

Other Anthropic models scored on Codesota.

Claude Opus 4
Undisclosed params · 14 results · 2 SOTA
Claude Sonnet 5
Undisclosed params · 2 results · 2 SOTA
Claude Sonnet 4
11 results · 1 SOTA
Claude Opus 4.5
4 results · 1 SOTA
Claude Opus 4.7
2 results · 1 SOTA
Claude Mythos Preview
1 result · 1 SOTA
Claude 3.5 Sonnet
Undisclosed params · 28 results
Claude Opus 4.5
Undisclosed params · 12 results
§ 06 · Sources & freshness

Where these numbers come from.

sdadas/PLCC
7
results
openai-simple-evals
4
results
llm-stats-bbh
1
result
arxiv
1
result
gsm8k-shadow-page
1
result
9 of 14 rows marked verified. · first result 2024-03-01, latest 2024-03-04.