Codesota · Benchmark · APPSHome/Leaderboards/Code & Software Engineering/Code Generation/APPS
Unknown

APPS.

10,000 coding problems from Codewars, AtCoder, Kattis, and CodeForces. Ranges from introductory to competition level.

Paper Leaderboard
§ 01 · Leaderboard

Results by metric.

Only 3 models on this benchmark
Help build the community leaderboard — submit your model results.
Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

pass@5

Pass@5 is the reported evaluation metric for APPS. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for pass@5verifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

RankModelTrustScoreYearLinksFix
01CodeLlama-34B
CodeLlama-34B (Meta AI, 2023). APPS pass@5 32.81% on test set. Table 3 of CodeLlama paper. 2-shot evaluation, nucleus sampling p=0.95.
verified32.812023Source ↗Looks wrong?
02CodeLlama-13B
CodeLlama-13B (Meta AI, 2023). APPS pass@5 23.74% on test set. Table 3 of CodeLlama paper. 2-shot evaluation.
verified23.742023Source ↗Looks wrong?
03CodeLlama-7B
CodeLlama-7B (Meta AI, 2023). APPS pass@5 10.76% on test set. Table 3 of CodeLlama paper. 2-shot evaluation.
verified10.762023Source ↗Looks wrong?
§ 04 · Submit a result

Add to the leaderboard.

← Back to Code Generation