Who leads the tabfact benchmark?

TabTracer currently leads tabfact with a score of 94.86 on Test.

What is the state-of-the-art score on tabfact?

The state-of-the-art result on tabfact is 94.86 (Test), achieved by TabTracer as of 2026.

How many models are tracked on tabfact?

Codesota tracks 21 models on tabfact across 2 metrics.

When was the tabfact leaderboard last updated?

The tabfact leaderboard on Codesota includes results through 2026, with the earliest tracked result from 2019.

Codesota · Benchmark · tabfactHome/Leaderboards/Vision & Documents/Document OCR/tabfact

Unknown

tabfact.

Name: tabfact Benchmark Results
Creator: Unknown
Published: 2019-01-01
License: https://creativecommons.org/licenses/by/4.0/

tabfact is a state-of-the-art machine learning benchmark indexed on Codesota. This page tracks published model results, top scores per metric, and the SOTA timeline for tabfact.

Paper ↗Leaderboard ↓

§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Found a wrong score or missing run?

Use row edits to send a sourced correction into moderation.

Add / edit result ↗Report issue ↗

Test

Test is the reported evaluation metric for tabfact. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Testverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	TabTracer TabTracer with Qwen3-32B backbone. Monte Carlo Tree Search for complex table reasoning. From paper: TabTracer: Monte Carlo Tree Search for Complex Table Reasoning with Large Language Models	verified	94.86	2026	Source ↗	Looks wrong?
02	TableMaster TableMaster with GPT-4o backbone. Adaptive reasoning with table verbalization. From paper: TableMaster: A Recipe to Advance Table Understanding with Language Models	verified	94.52	2025	Source ↗	Looks wrong?
03	ARTEMIS-DA From paper: ARTEMIS-DA: An Advanced Reasoning and Transformation Engine for Multi-Step Insight Synthesis in Data Analytics	verified	93.1	2024	Paper ↗	Looks wrong?
04	Dater From paper: Large Language Models are Versatile Decomposers: Decompose Evidence and Questions for Table-based Reasoning	verified	93	2023	Paper ↗Code ↗	Looks wrong?
05	STaR-8B STaR-8B with Qwen3-8B backbone. Slow-thinking via SFT+RFT+uncertainty quantification. From paper: STaR: Towards Effective and Stable Table Reasoning via Slow-Thinking Large Language Models	verified	92.05	2025	Source ↗	Looks wrong?
06	PASTA From paper: PASTA: Table-Operations Aware Fact Verification via Sentence-Table Cloze Pre-training	verified	89.3	2022	Paper ↗Code ↗	Looks wrong?
07	T-REX (Phi-4) T-REX using Phi-4 (14B) with chain-of-thought and naturalized text table format. From paper: T-REX: Table – Refute or Entail eXplainer	verified	89	2025	Source ↗	Looks wrong?
08	PoTable PoTable with GPT-4o-mini backbone on TabFact small test set. Stage-oriented plan-then-execute reasoning. From paper: PoTable: Programming Standardly on Table-based Reasoning Like a Human Analyst	verified	88.93	2024	Source ↗	Looks wrong?
09	Chain-of-Table From paper: Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding	verified	86.61	2024	Paper ↗Code ↗	Looks wrong?
10	Binder From paper: Binding Language Models in Symbolic Languages	verified	86	2022	Paper ↗Code ↗	Looks wrong?
11	Tab-PoT From paper: Efficient Prompting for LLM-based Generative Internet of Things	verified	85.77	2024	Paper ↗	Looks wrong?
12	ReasTAP-Large From paper: ReasTAP: Injecting Table Reasoning Skills During Pre-training via Synthetic Reasoning Examples	verified	84.9	2022	Paper ↗Code ↗	Looks wrong?
13	TAPEX-Large From paper: TAPEX: Table Pre-training via Learning a Neural SQL Executor	verified	84.2	2021	Paper ↗Code ↗	Looks wrong?
14	RePanda RePanda using fine-tuned DeepSeek-coder-7B on PanTabFact dataset with pandas-based structured reasoning. From paper: RePanda: Pandas-powered Tabular Verification and Reasoning	verified	84.09	2025	Source ↗	Looks wrong?
15	T5-3b(UnifiedSKG) From paper: UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models	verified	83.68	2022	Paper ↗Code ↗	Looks wrong?
16	Salience-aware TAPAS From paper: Table-based Fact Verification with Salience-aware Learning	verified	82.1	2021	Paper ↗Code ↗	Looks wrong?
17	TAPAS-Large classifier with Counterfactual + Synthetic pre-training From paper: Understanding tables with intermediate pre-training	verified	81	2020	Paper ↗Code ↗	Looks wrong?
18	TabSQLify (col+row) From paper: TabSQLify: Enhancing Reasoning Capabilities of LLMs Through Table Decomposition	verified	79.5	2024	Paper ↗Code ↗	Looks wrong?
19	NormTab (Targeted) + SQL From paper: NormTab: Improving Symbolic Reasoning in LLMs Through Tabular Data Normalization	verified	68.9	2024	Paper ↗Code ↗	Looks wrong?
20	Table-BERT-Horizontal-T+F-Template From paper: TabFact: A Large-scale Dataset for Table-based Fact Verification	verified	65.12	2019	Paper ↗Code ↗	Looks wrong?
21	BERT classifier w/o Table From paper: TabFact: A Large-scale Dataset for Table-based Fact Verification	verified	50.5	2019	Paper ↗Code ↗	Looks wrong?

Val

Val is the reported evaluation metric for tabfact. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Valverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	PASTA From paper: PASTA: Table-Operations Aware Fact Verification via Sentence-Table Cloze Pre-training	verified	89.2	2022	Paper ↗Code ↗	Looks wrong?
02	ReasTAP-Large From paper: ReasTAP: Injecting Table Reasoning Skills During Pre-training via Synthetic Reasoning Examples	verified	84.6	2022	Paper ↗Code ↗	Looks wrong?
03	TAPEX-Large From paper: TAPEX: Table Pre-training via Learning a Neural SQL Executor	verified	84.6	2021	Paper ↗Code ↗	Looks wrong?
04	T5-3b(UnifiedSKG) From paper: UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models	verified	83.97	2022	Paper ↗Code ↗	Looks wrong?
05	Salience-aware TAPAS From paper: Table-based Fact Verification with Salience-aware Learning	verified	82.7	2021	Paper ↗Code ↗	Looks wrong?
06	TAPAS-Large classifier with Counterfactual + Synthetic pre-training From paper: Understanding tables with intermediate pre-training	verified	81	2020	Paper ↗Code ↗	Looks wrong?
07	Table-BERT-Horizontal-T+F-Template From paper: TabFact: A Large-scale Dataset for Table-based Fact Verification	verified	66.1	2019	Paper ↗Code ↗	Looks wrong?
08	BERT classifier w/o Table From paper: TabFact: A Large-scale Dataset for Table-based Fact Verification	verified	50.9	2019	Paper ↗Code ↗	Looks wrong?

§ 04 · Submit a result

Add to the leaderboard.

← Back to Document OCR