Code Completion2023en
Cross-File Code Completion Evaluation
Multilingual benchmark (Python, TypeScript, Java, C#) for cross-file code completion requiring understanding of cross-file context. 1000 examples per language from GitHub repos. Primary metric is Exact Match.
Current State of the Art
Claude Sonnet 4
Anthropic
44.5
exact-match
exact-match Progress Over Time
Showing 4 breakthroughs from Oct 2023 to Mar 2026
Key Milestones
Oct 2023
GPT-4o
CrossCodeEval paper Table 3. Python EM with BM25 cross-file context retrieval.
38.2
Jun 2024
DeepSeek-Coder-V2-Instruct
DeepSeek-Coder-V2 paper. Python EM with BM25 retrieval.
41.3
+8.1%
Sep 2024
Qwen2.5-Coder-32B-Instruct
Qwen2.5-Coder paper. Python EM with BM25 retrieval.
43.7
+5.8%
Mar 2026
Claude Sonnet 4Current SOTA
Anthropic model card. Python EM with BM25 retrieval.
44.5
+1.8%
Total Improvement
16.5%
Time Span
2y 6m
Breakthroughs
4
Current SOTA
44.5
Top Models Performance Comparison
Top 6 models ranked by exact-match
Best Score
44.5
Top Model
Claude Sonnet 4
Models Compared
6
Score Range
12.4
exact-matchPrimary
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | Claude Sonnet 4API Anthropic | 44.5 | Mar 2026 | |
| 2 | Qwen2.5-Coder-32B-InstructOpen Source Alibaba | 43.7 | Sep 2024 | |
| 3 | DeepSeek-Coder-V2-InstructOpen Source DeepSeek | 41.3 | Jun 2024 | |
| 4 | GPT-4oAPI OpenAI | 38.2 | Oct 2023 | |
| 5 | Codestral 22B Mistral | 35.6 | Codestral: Hello, World! | May 2024 |
| 6 | StarCoder2-15BOpen Source BigCode | 32.1 | Feb 2024 |
Related Papers4
Qwen2.5-Coder Technical Report
Sep 2024Models: Qwen2.5-Coder-32B-Instruct
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
Jun 2024Models: DeepSeek-Coder-V2-Instruct
StarCoder2 and The Stack v2: The Next Generation
Feb 2024Models: StarCoder2-15B