Logical Reasoning
Logical reasoning — formal deduction, constraint satisfaction, and syllogistic inference — exposes a core weakness in autoregressive language models: they pattern-match rather than prove. Benchmarks like LogiQA, FOLIO, and the ReClor reading comprehension test push models toward deductive rigor, and performance improves substantially with chain-of-thought and self-consistency decoding. But systematic evaluations (2023-2024) show that even frontier models fail on problems requiring more than 3-4 reasoning steps, and neurosymbolic approaches that compile to SAT solvers or proof assistants remain more reliable for true logical correctness.
LogiQA
8,678 logical reasoning questions from National Civil Servants Examinations of China.
Top 10
Leading models on LogiQA.
All datasets
4 datasets tracked for this task.
Related tasks
Other tasks in Reasoning.