Codesota · RL Environmentslegacy enterprise (COBOL) — floor effect← All environments
§ Ranked #12 by discriminative power

COBOLBench.

An environment for legacy enterprise (COBOL) — floor effect. Across 5 models with public results it spreads the best and worst 2%but no model clears the floor, so it ranks only degrees of failure.

§ Public model scores

Who wins COBOLBench.

Best public result per model entry, normalized 0..1. The spread between the top and bottom rows is what makes this environment worth — or not worth — a training run.

#Modelpass@4
01GPT-5.511%
02Opus-4.79%
03Gemini-3.1-Pro9%
04DeepSeek-V4-Pro9%
05Kimi-K2.69%
§ Nearby in the ranking
#EnvironmentSpreadDiscriminative
10CompileBenchcompile/cross-compile real OSS27%0.14
11SWE-Bench-Prosoftware engineering (audited)11%0.11
12COBOLBenchlegacy enterprise (COBOL) — floor effect2%0.02
§ Work with us

Need one that still separates models?

When the public environment for your capability saturates, you can’t tell your models apart and you can’t train past it. We build private, contamination-resistant, verifiable-reward environments and evals on a hold-out set — designed to discriminate where the public ones no longer do.