Accuracy is the reported evaluation metric for GPQA. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
| Rank | Model | Trust | Score | Year | Source |
|---|---|---|---|---|---|
| 01 | Qwen2.5-Plus | paper | 49.7 | N/A | Source ↗ |