Codesota · Models · GLM-5Zhipu AI19 results · 9 benchmarks
Model card

GLM-5.

Zhipu AIopen-source130B params1 current SOTA
§ 02 · Benchmarks

Every benchmark GLM-5 has a recorded score for.

#BenchmarkArea · TaskMetricValueRankDateSource
01Tau2-BenchAgentic AI · Tool Useaccuracy89.7%#1/11source ↗
02React Native EvalsMobile Development · React Native Code Generationanimation-satisfaction66.0%#4/10source ↗
03SWE-Bench VerifiedComputer Code · Code Generationaccuracy77.8%#6/22source ↗
04SWE-benchComputer Code · Code Generationresolve-rate-agentic77.8%#7/252026-01-01source ↗
05BrowseCompNatural Language Processing · Question Answeringaccuracy62.0%#8/16source ↗
06React Native EvalsMobile Development · React Native Code Generationrequirement-satisfaction74.2%#8/10source ↗
07React Native EvalsMobile Development · React Native Code Generationnavigation-satisfaction86.7%#8/10source ↗
08SWE-benchComputer Code · Code Generationresolve-rate77.8%#9/322026-01-01source ↗
09React Native EvalsMobile Development · React Native Code Generationasync-state-satisfaction73.8%#9/10source ↗
10SWE-bench VerifiedAgentic AI · SWE-benchresolve-rate77.8%#11/81source ↗
11GPQA DiamondReasoning · Multi-step Reasoningaccuracy86.0%#16/74source ↗
12HLEReasoning · Multi-step Reasoningaccuracy30.5%#16/74source ↗
13PLCCNatural Language Processing · Polish Cultural Competencygrammar82.0%#16/165source ↗
14PLCCNatural Language Processing · Polish Cultural Competencygeography91.0%#21/165source ↗
15PLCCNatural Language Processing · Polish Cultural Competencyhistory88.0%#28/165source ↗
16PLCCNatural Language Processing · Polish Cultural Competencyaverage80.0%#33/165source ↗
17PLCCNatural Language Processing · Polish Cultural Competencyculture-and-tradition81.0%#37/165source ↗
18PLCCNatural Language Processing · Polish Cultural Competencyvocabulary72.0%#39/165source ↗
19PLCCNatural Language Processing · Polish Cultural Competencyart-and-entertainment66.0%#47/165source ↗
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 03 · Strengths by area

Where GLM-5 actually performs.

Agentic AI
2
benchmarks
avg rank #6.0 · 1 SOTA
Mobile Development
1
benchmark
avg rank #7.3
Computer Code
2
benchmarks
avg rank #7.3
Reasoning
2
benchmarks
avg rank #16.0
Natural Language Processing
2
benchmarks
avg rank #28.6
§ 04 · Papers

2 papers with results for GLM-5.

  1. 2026-02-17· 5 results

    GLM-5: from Vibe Coding to Agentic Engineering

  2. 2023-10-10· Computer Code· 1 result

    SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

    Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao et al.
§ 05 · Related models

Other Zhipu AI models scored on Codesota.

GLM-4.5
6 results
GLM-4.5-Air
5 results
GLM-OCR
3 results
GLM-4.5
2 results
GLM-4.5-Air
2 results
GLM-4.7
2 results
GLM-4.6
1 result
GLM-4.7
1 result
§ 06 · Sources & freshness

Where these numbers come from.

sdadas/PLCC
7
results
pwc-dump
5
results
Callstack Incubator
4
results
zhipu-agent
1
result
swebench-leaderboard
1
result
editorial
1
result
14 of 19 rows marked verified.