Optical Character Recognition2020en

CodeSearchNet

Benchmark for code summarization (docstring generation) across 6 programming languages: Python, Java, JavaScript, PHP, Ruby, Go. Over 2M (code, docstring) pairs. Primary metric is BLEU-4.

Metrics:accuracy, cer, wer, f1

bleu-4

#ModelScorePaper / CodeDate
1
GPT-4oAPI
OpenAI
25.3Mar 2026
2
Qwen2.5-Coder-32B-InstructOpen Source
Alibaba
23.4Sep 2024
3
DeepSeek-Coder-V2-InstructOpen Source
DeepSeek
22.8Jun 2024
4
CodeT5+ 2BOpen Source
Salesforce
21.36May 2023
5
CodeT5+Open Source
Salesforce
20.01May 2023
6
UniXcoderOpen Source
Microsoft
19.06Mar 2022
7
CodeBERTOpen Source
Microsoft
17.65Feb 2020

smoothed-bleu-4

Related Papers6

Other Optical Character Recognition Datasets