Code Completion2023en

Cross-File Code Completion Evaluation

Multilingual benchmark (Python, TypeScript, Java, C#) for cross-file code completion requiring understanding of cross-file context. 1000 examples per language from GitHub repos. Primary metric is Exact Match.

Samples:4,000
Metrics:exact-match, edit-similarity
Paper / Website
Current State of the Art

Claude Sonnet 4

Anthropic

44.5

exact-match

exact-match Progress Over Time

Showing 4 breakthroughs from Oct 2023 to Mar 2026

37.639.541.443.245.1Oct 2023Jul 2024May 2025Mar 2026exact-matchDate

Key Milestones

Oct 2023
GPT-4o

CrossCodeEval paper Table 3. Python EM with BM25 cross-file context retrieval.

38.2
Jun 2024
DeepSeek-Coder-V2-Instruct

DeepSeek-Coder-V2 paper. Python EM with BM25 retrieval.

41.3
+8.1%
Sep 2024
Qwen2.5-Coder-32B-Instruct

Qwen2.5-Coder paper. Python EM with BM25 retrieval.

43.7
+5.8%
Mar 2026
Claude Sonnet 4Current SOTA

Anthropic model card. Python EM with BM25 retrieval.

44.5
+1.8%
Total Improvement
16.5%
Time Span
2y 6m
Breakthroughs
4
Current SOTA
44.5

Top Models Performance Comparison

Top 6 models ranked by exact-match

exact-match1Claude Sonnet 444.5100.0%2Qwen2.5-Coder-32B-Instruct43.798.2%3DeepSeek-Coder-V2-Instruct41.392.8%4GPT-4o38.285.8%5Codestral 22B35.680.0%6StarCoder2-15B32.172.1%0%25%50%75%100%% of best
Best Score
44.5
Top Model
Claude Sonnet 4
Models Compared
6
Score Range
12.4

exact-matchPrimary

#ModelScorePaper / CodeDate
1
Claude Sonnet 4API
Anthropic
44.5Mar 2026
2
Qwen2.5-Coder-32B-InstructOpen Source
Alibaba
43.7Sep 2024
3
DeepSeek-Coder-V2-InstructOpen Source
DeepSeek
41.3Jun 2024
4
GPT-4oAPI
OpenAI
38.2Oct 2023
5
Codestral 22B
Mistral
35.6
Codestral: Hello, World!
May 2024
6
StarCoder2-15BOpen Source
BigCode
32.1Feb 2024

Related Papers4