Codesota · Benchmark · STS BenchmarkHome/Leaderboards/STS Benchmark
Unknown

STS Benchmark.

Semantic textual similarity with human-annotated sentence pairs

Paper Leaderboard
§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Only 3 models on this benchmark
Help build the community leaderboard — submit your model results.
Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

Spearman

Spearman is the reported evaluation metric for STS Benchmark. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Spearmanverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01GTE-Qwen2-7B-instruct
GTE-Qwen2-7B Spearman on STS Benchmark test split. MTEB STS sub-task average.
verified88.42024Source ↗Looks wrong?
02E5-Mistral-7B-instruct
E5-Mistral-7B Spearman on STS Benchmark. From MTEB STS sub-task results.
verified84.72024Source ↗Looks wrong?
03all-MiniLM-L6-v2
all-MiniLM-L6-v2 Spearman on STS Benchmark test. From official model card.
verified82.82022Source ↗Looks wrong?
§ 04 · Submit a result

Add to the leaderboard.

← Back to Leaderboards