Large-scale QA benchmark with trivia questions and independently gathered evidence documents.
Accuracy is the reported evaluation metric for TriviaQA. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
| Rank | Model | Trust | Score | Year | Links | Edit |
|---|---|---|---|---|---|---|
| 01 | Llama 2 70B (5-shot) | unverified | 85 | 2023 | Paper ↗Code ↗ | Edit result |
| 02 | LLaMA-65B | unverified | 73 | 2023 | Paper ↗Code ↗ | Edit result |
| 03 | SmoLM2 (1.7B) | unverified | 36.7 | 2025 | Paper ↗Code ↗ | Edit result |
| 04 | BitNet b1.58 2B4T | unverified | 33.57 | 2025 | Paper ↗Code ↗ | Edit result |