Recent studyBlind TTS Elo is live. Compare two anonymous voice samples, vote after listening, and help separate real preference signal from noise.Vote in the study ->
Codesota · Models · Apertus-70B-Instruct7 results · 7 benchmarks
Model card

Apertus-70B-Instruct.

unknown
§ 02 · Benchmarks

Every benchmark Apertus-70B-Instruct has a recorded score for.

#BenchmarkArea · TaskMetricValueRankDateSource
01DROPNatural Language Processing · Question Answeringf150.8%#4/5source ↗
02MBPP+Computer Code · Code Generationpass-147.0%#7/9source ↗
03BIG-Bench HardReasoning · Multi-step Reasoningaccuracy64.2%#9/11source ↗
04HellaSwagReasoning · Commonsense Reasoningaccuracy78.1%#11/17source ↗
05GSM8KReasoning · Mathematical Reasoningaccuracy77.6%#39/48source ↗
06MATHReasoning · Mathematical Reasoningaccuracy30.8%#43/46source ↗
07MMLUReasoning · Commonsense Reasoningaccuracy69.6%#54/64source ↗
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 03 · Strengths by area

Where Apertus-70B-Instruct actually performs.

Natural Language Processing
1
benchmark
avg rank #4.0
Computer Code
1
benchmark
avg rank #7.0
Reasoning
5
benchmarks
avg rank #31.2
§ 04 · Papers

1 paper with results for Apertus-70B-Instruct.

  1. 2025-09-17· 7 results

    Apertus: Democratizing Open and Compliant LLMs for Global Language Environments

§ 06 · Sources & freshness

Where these numbers come from.

pwc-dump
7
results
0 of 7 rows marked verified.