Codesota · Models · Gemini 1.5 ProGoogle21 results · 17 benchmarks

Model card

Gemini 1.5 Pro.

GoogleapiMultimodal LLMProprietary3 current SOTA

1M token context window. Released February 2024.

§ 02 · Benchmarks

Every benchmark Gemini 1.5 Pro has a recorded score for.

#	Benchmark	Area · Task	Metric	Value	Rank	Date	Source
01	CC-OCR	Computer Vision · General OCR Capabilities	multilingual-f1	79.0%	#1/8	—	source ↗
02	CC-OCR	Computer Vision · General OCR Capabilities	document-parsing	62.4%	#1/6	—	source ↗
03	CC-OCR	Computer Vision · General OCR Capabilities	multi-scene-f1	83.3%	#1/9	—	source ↗
04	BIG-Bench Hard	Reasoning · Multi-step Reasoning	accuracy	89.2%	#2/11	—	source ↗
05	CC-OCR	Computer Vision · General OCR Capabilities	kie-f1	67.3%	#2/5	—	source ↗
06	HellaSwag	Reasoning · Commonsense Reasoning	accuracy	92.5%	#2/17	—	source ↗
07	CNN/DailyMail	Natural Language Processing · Text Summarization	rouge-1	45.8%	#3/6	2024-02-15	source ↗
08	CNN/DailyMail	Natural Language Processing · Text Summarization	rouge-l	43.0%	#3/7	2024-02-15	source ↗
09	VQA v2.0	Multimodal · Visual Question Answering	accuracy	86.5%	#3/16	2024-02-15	source ↗
10	SQuAD v2.0	Natural Language Processing · Question Answering	f1	90.5%	#4/26	2024-02-15	source ↗
11	MME-VideoOCR	Computer Vision · General OCR Capabilities	total-accuracy	64.9%	#5/6	—	source ↗
12	ARC-Challenge	Reasoning · Commonsense Reasoning	accuracy	94.8%	#9/10	—	source ↗
13	TextVQA	Multimodal · Visual Question Answering	accuracy	82.2%	#12/23	2024-02-15	source ↗
14	MMBench	Multimodal · Visual Question Answering	accuracy	73.9%	#19/20	2024-02-15	source ↗
15	MMMU	Multimodal · Visual Question Answering	accuracy	62.2%	#21/30	2024-02-15	source ↗
16	GSM8K	Reasoning · Mathematical Reasoning	accuracy	91.7%	#34/48	—	source ↗
17	HumanEval	Computer Code · Code Generation	pass@1	71.9%	#38/42	—	source ↗
18	MATH	Reasoning · Mathematical Reasoning	accuracy	67.7%	#38/46	—	source ↗
19	MMLU	Reasoning · Commonsense Reasoning	accuracy	85.9%	#42/64	—	source ↗
20	HLE	Reasoning · Multi-step Reasoning	accuracy	4.6%	#67/74	—	source ↗
21	GPQA Diamond	Reasoning · Multi-step Reasoning	accuracy	46.2%	#69/74	—	source ↗

Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.

§ 03 · Strengths by area

Where Gemini 1.5 Pro actually performs.

Computer Vision

benchmarks

avg rank #2.0 · 3 SOTA

Natural Language Processing

§ 04 · Papers

1 paper with results for Gemini 1.5 Pro.

2024-02-15· Natural Language Processing· 7 results
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

§ 05 · Related models

Other Google models scored on Codesota.

Undisclosed params · 12 results · 1 SOTA

ViT-H/14

632M params · 2 results · 1 SOTA

CoCa (finetuned)

2.1B params · 1 result · 1 SOTA

Gemini 2.0 Flash

1 result · 1 SOTA

Noise2Music

Unknown params · 1 result · 1 SOTA

Gemini 3 Flash

Undisclosed params · 6 results

§ 06 · Sources & freshness

Where these numbers come from.

arxiv

results

alphaxiv-leaderboard

results

google-blog

results

openai-simple-evals

results

llm-stats-bbh

result

scale-hle-official

result

9 of 21 rows marked verified.

Gemini 1.5 Pro.

Every benchmark Gemini 1.5 Pro has a recorded score for.

Where Gemini 1.5 Pro actually performs.

1 paper with results for Gemini 1.5 Pro.

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Other Google models scored on Codesota.

Where these numbers come from.