Codesota · Benchmark · APPSHome/Leaderboards/Computer Code/Code Generation/APPS
Unknown

APPS.

10,000 coding problems from Codewars, AtCoder, Kattis, and CodeForces. Ranges from introductory to competition level.

Paper Leaderboard
§ 01 · SOTA history

Year over year.

Not enough data to show trend.
§ 02 · Leaderboard

Results by metric.

Only 3 models on this benchmark
Help build the community leaderboard — submit your model results.

pass@5

pass@5

Higher is better

Trust tiers for pass@5verifiedpapervendorcommunityunverified
RankModelTrustScoreYearSource
01CodeLlama-34B
CodeLlama-34B (Meta AI, 2023). APPS pass@5 32.81% on test set. Table 3 of CodeLlama paper. 2-shot evaluation, nucleus sampling p=0.95.
verified32.812023Source ↗
02CodeLlama-13B
CodeLlama-13B (Meta AI, 2023). APPS pass@5 23.74% on test set. Table 3 of CodeLlama paper. 2-shot evaluation.
verified23.742023Source ↗
03CodeLlama-7B
CodeLlama-7B (Meta AI, 2023). APPS pass@5 10.76% on test set. Table 3 of CodeLlama paper. 2-shot evaluation.
verified10.762023Source ↗
§ 04 · Submit a result

Add to the leaderboard.

← Back to Code Generation