Codesota · Benchmark · tabfactHome/Leaderboards/Vision & Documents/Document OCR/tabfact
Unknown

tabfact.

tabfact is a state-of-the-art machine learning benchmark indexed on Codesota. This page tracks published model results, top scores per metric, and the SOTA timeline for tabfact.

Paper Leaderboard
§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

Test

Test is the reported evaluation metric for tabfact. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Testverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01TabTracer
TabTracer with Qwen3-32B backbone. Monte Carlo Tree Search for complex table reasoning. From paper: TabTracer: Monte Carlo Tree Search for Complex Table Reasoning with Large Language Models
verified94.862026Source ↗Looks wrong?
02TableMaster
TableMaster with GPT-4o backbone. Adaptive reasoning with table verbalization. From paper: TableMaster: A Recipe to Advance Table Understanding with Language Models
verified94.522025Source ↗Looks wrong?
03ARTEMIS-DA
From paper: ARTEMIS-DA: An Advanced Reasoning and Transformation Engine for Multi-Step Insight Synthesis in Data Analytics
verified93.12024Paper ↗Looks wrong?
04Dater
From paper: Large Language Models are Versatile Decomposers: Decompose Evidence and Questions for Table-based Reasoning
verified932023Paper ↗Code ↗Looks wrong?
05STaR-8B
STaR-8B with Qwen3-8B backbone. Slow-thinking via SFT+RFT+uncertainty quantification. From paper: STaR: Towards Effective and Stable Table Reasoning via Slow-Thinking Large Language Models
verified92.052025Source ↗Looks wrong?
06PASTA
From paper: PASTA: Table-Operations Aware Fact Verification via Sentence-Table Cloze Pre-training
verified89.32022Paper ↗Code ↗Looks wrong?
07T-REX (Phi-4)
T-REX using Phi-4 (14B) with chain-of-thought and naturalized text table format. From paper: T-REX: Table – Refute or Entail eXplainer
verified892025Source ↗Looks wrong?
08PoTable
PoTable with GPT-4o-mini backbone on TabFact small test set. Stage-oriented plan-then-execute reasoning. From paper: PoTable: Programming Standardly on Table-based Reasoning Like a Human Analyst
verified88.932024Source ↗Looks wrong?
09Chain-of-Table
From paper: Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding
verified86.612024Paper ↗Code ↗Looks wrong?
10Binder
From paper: Binding Language Models in Symbolic Languages
verified862022Paper ↗Code ↗Looks wrong?
11Tab-PoT
From paper: Efficient Prompting for LLM-based Generative Internet of Things
verified85.772024Paper ↗Looks wrong?
12ReasTAP-Large
From paper: ReasTAP: Injecting Table Reasoning Skills During Pre-training via Synthetic Reasoning Examples
verified84.92022Paper ↗Code ↗Looks wrong?
13TAPEX-Large
From paper: TAPEX: Table Pre-training via Learning a Neural SQL Executor
verified84.22021Paper ↗Code ↗Looks wrong?
14RePanda
RePanda using fine-tuned DeepSeek-coder-7B on PanTabFact dataset with pandas-based structured reasoning. From paper: RePanda: Pandas-powered Tabular Verification and Reasoning
verified84.092025Source ↗Looks wrong?
15T5-3b(UnifiedSKG)
From paper: UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models
verified83.682022Paper ↗Code ↗Looks wrong?
16Salience-aware TAPAS
From paper: Table-based Fact Verification with Salience-aware Learning
verified82.12021Paper ↗Code ↗Looks wrong?
17TAPAS-Large classifier with Counterfactual + Synthetic pre-training
From paper: Understanding tables with intermediate pre-training
verified812020Paper ↗Code ↗Looks wrong?
18TabSQLify (col+row)
From paper: TabSQLify: Enhancing Reasoning Capabilities of LLMs Through Table Decomposition
verified79.52024Paper ↗Code ↗Looks wrong?
19NormTab (Targeted) + SQL
From paper: NormTab: Improving Symbolic Reasoning in LLMs Through Tabular Data Normalization
verified68.92024Paper ↗Code ↗Looks wrong?
20Table-BERT-Horizontal-T+F-Template
From paper: TabFact: A Large-scale Dataset for Table-based Fact Verification
verified65.122019Paper ↗Code ↗Looks wrong?
21BERT classifier w/o Table
From paper: TabFact: A Large-scale Dataset for Table-based Fact Verification
verified50.52019Paper ↗Code ↗Looks wrong?

Val

Val is the reported evaluation metric for tabfact. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Valverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01PASTA
From paper: PASTA: Table-Operations Aware Fact Verification via Sentence-Table Cloze Pre-training
verified89.22022Paper ↗Code ↗Looks wrong?
02ReasTAP-Large
From paper: ReasTAP: Injecting Table Reasoning Skills During Pre-training via Synthetic Reasoning Examples
verified84.62022Paper ↗Code ↗Looks wrong?
03TAPEX-Large
From paper: TAPEX: Table Pre-training via Learning a Neural SQL Executor
verified84.62021Paper ↗Code ↗Looks wrong?
04T5-3b(UnifiedSKG)
From paper: UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models
verified83.972022Paper ↗Code ↗Looks wrong?
05Salience-aware TAPAS
From paper: Table-based Fact Verification with Salience-aware Learning
verified82.72021Paper ↗Code ↗Looks wrong?
06TAPAS-Large classifier with Counterfactual + Synthetic pre-training
From paper: Understanding tables with intermediate pre-training
verified812020Paper ↗Code ↗Looks wrong?
07Table-BERT-Horizontal-T+F-Template
From paper: TabFact: A Large-scale Dataset for Table-based Fact Verification
verified66.12019Paper ↗Code ↗Looks wrong?
08BERT classifier w/o Table
From paper: TabFact: A Large-scale Dataset for Table-based Fact Verification
verified50.92019Paper ↗Code ↗Looks wrong?
§ 04 · Submit a result

Add to the leaderboard.

← Back to Document OCR