Codesota · Language & TextWhich model, what task, at what costIssue: March 2026
§ 00 · Language & text
Text task router
Pick the text output you need: answer, vector, label, entities, translation, or summary. LLM leaderboards are only one slice of the language stack.
Use `/llm` for frontier reasoning, `/benchmarks/mteb` for embeddings, and the task rows below for specialised NLP work.
§ 01 · Text tasks
Not every task needs an LLM.
Six text-processing axes where specialised models still compete — or win outright — on latency, cost, or accuracy at scale.
Text Embeddings →
Semantic search, RAG, clustering
MTEB
KaLM-Gemma3-12B (72.3%)
Translation →
33+ languages, document-level
WMT
HY-MT1.5 (WMT2025 winner)
Question Answering →
Extractive, abstractive, multi-hop
SQuAD, TriviaQA
GPT-5 / Claude 4
Named Entity Recognition →
People, orgs, locations, custom
CoNLL-2003
Fine-tuned DeBERTa v3
Text Classification →
Sentiment, intent, topic
GLUE, SuperGLUE
DeBERTa v3 (GLUE 91.3)
Summarization →
News, documents, conversations
CNN/DailyMail
Claude 4 / GPT-5
§ 02 · Decision
LLM, or specialised model?
Use an LLM when
- ·Few examples available (few-shot)
- ·Complex, nuanced task definitions
- ·You need to explain reasoning
- ·The task evolves frequently
- ·Low volume (< 10K requests/day)
Use a specialised model when
- ·High volume (> 100K requests/day)
- ·Latency critical (< 100ms)
- ·Cost sensitive (pennies per 1K calls)
- ·Well-defined, stable task
- ·Training data available
§ 03 · Keep reading
Go deeper.
Verified benchmarks across every text task. Submit new SOTA results or suggest benchmarks we should be tracking.