Codesota · Models · GPT-3.5-turboOpenAI17 results · 3 benchmarks
Model card

GPT-3.5-turbo.

OpenAIopen-source
§ 02 · Benchmarks

Every benchmark GPT-3.5-turbo has a recorded score for.

#BenchmarkArea · TaskMetricValueRankDateSource
01Polish MT-BenchNatural Language Processing · Polish Conversation Qualityhumanities9.8%#9/50source ↗
02Polish MT-BenchNatural Language Processing · Polish Conversation Qualitywriting9.1%#14/50source ↗
03Polish MT-BenchNatural Language Processing · Polish Conversation Qualitymath6.8%#14/50source ↗
04Polish MT-BenchNatural Language Processing · Polish Conversation Qualitycoding6.0%#16/50source ↗
05Polish MT-BenchNatural Language Processing · Polish Conversation Qualitystem9.3%#17/50source ↗
06Polish MT-BenchNatural Language Processing · Polish Conversation Qualityroleplay8.7%#20/50source ↗
07Polish MT-BenchNatural Language Processing · Polish Conversation Qualitypl-score7.7%#21/50source ↗
08Polish MT-BenchNatural Language Processing · Polish Conversation Qualityextraction8.2%#26/50source ↗
09Polish MT-BenchNatural Language Processing · Polish Conversation Qualityreasoning5.2%#27/50source ↗
10Polish EQ-BenchNatural Language Processing · Polish Emotional Intelligenceeq-score57.7%#37/101source ↗
11PLCCNatural Language Processing · Polish Cultural Competencyart-and-entertainment39.0%#117/165source ↗
12PLCCNatural Language Processing · Polish Cultural Competencygeography55.0%#121/165source ↗
13PLCCNatural Language Processing · Polish Cultural Competencyaverage43.3%#128/165source ↗
14PLCCNatural Language Processing · Polish Cultural Competencyvocabulary36.0%#132/165source ↗
15PLCCNatural Language Processing · Polish Cultural Competencyculture-and-tradition38.0%#133/165source ↗
16PLCCNatural Language Processing · Polish Cultural Competencyhistory51.0%#137/165source ↗
17PLCCNatural Language Processing · Polish Cultural Competencygrammar41.0%#144/165source ↗
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 03 · Strengths by area

Where GPT-3.5-turbo actually performs.

Natural Language Processing
3
benchmarks
avg rank #65.5
§ 05 · Related models

Other OpenAI models scored on Codesota.

GPT-4o
Undisclosed params · 38 results · 9 SOTA
o3
17 results · 5 SOTA
o4-mini
14 results · 2 SOTA
o3 (high)
2 results · 1 SOTA
Codex / GPT-5.5
1 result · 1 SOTA
Codex CLI (GPT-5.5)
1 result · 1 SOTA
o4-mini (high)
1 result · 1 SOTA
o1
12 results
§ 06 · Sources & freshness

Where these numbers come from.

SpeakLeash/MT-Bench-PL
9
results
sdadas/PLCC
7
results
SpeakLeash/Polish-EQ-Bench
1
result
17 of 17 rows marked verified.