Benchmark for code summarization (docstring generation) across 6 programming languages: Python, Java, JavaScript, PHP, Ruby, Go. Over 2M (code, docstring) pairs. Primary metric is BLEU-4.
14 results indexed across 2 metrics. Shaded row marks current SOTA; ties broken by submission date.
| # | Model | Org | Submitted | Paper / code | bleu-4 |
|---|---|---|---|---|---|
| 01 | GPT-4oAPI | OpenAI | Mar 2026 | arxiv | 25.30 |
| 02 | Qwen2.5-Coder 32BOpen | Alibaba | Sep 2024 | Qwen2.5-Coder Technical Report · code | 23.40 |
| 03 | DeepSeek-Coder-V2-InstructOpen | DeepSeek | Jun 2024 | DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source… · code | 22.80 |
| 04 | CodeT5+ 2BOpen | Salesforce | May 2023 | CodeT5+: Open Code Large Language Models for Code Unders… · code | 21.36 |
| 05 | CodeT5+Open | Salesforce | May 2023 | CodeT5+: Open Code Large Language Models for Code Unders… · code | 20.01 |
| 06 | UniXcoderOpen | Microsoft | Mar 2022 | UniXcoder: Unified Cross-Modal Pre-Training for Code Rep… · code | 19.06 |
| 07 | CodeBERTOpen | Microsoft | Feb 2020 | CodeBERT: A Pre-Trained Model for Programming and Natura… · code | 17.65 |
| # | Model | Org | Submitted | Paper / code | smoothed-bleu-4 |
|---|---|---|---|---|---|
| 01 | CodeBERT (MLM+RTD) | — | Feb 2020 | CodeBERT: A Pre-Trained Model for Programming and Natura… · code | 15.99 |
| 02 | CodeBERT (MLM) | — | Feb 2020 | CodeBERT: A Pre-Trained Model for Programming and Natura… · code | 15.55 |
| 03 | pre-train w/ code only | — | Feb 2020 | CodeBERT: A Pre-Trained Model for Programming and Natura… · code | 15.15 |
| 04 | CodeBERT (RTD) | — | Feb 2020 | CodeBERT: A Pre-Trained Model for Programming and Natura… · code | 15.03 |
| 05 | RoBERTa | — | Feb 2020 | CodeBERT: A Pre-Trained Model for Programming and Natura… · code | 14.52 |
| 06 | Transformer | — | Feb 2020 | CodeBERT: A Pre-Trained Model for Programming and Natura… · code | 14.31 |
| 07 | seq2seq | — | Feb 2020 | CodeBERT: A Pre-Trained Model for Programming and Natura… · code | 13.36 |
Every paper below corresponds to at least one row in the leaderboard above. Click through for the arXiv preprint and, when available, the reference implementation.
Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.