Recent studyBlind TTS Elo is live. Compare two anonymous voice samples, vote after listening, and help separate real preference signal from noise.Vote in the study ->
Codesota · Mobile Development · React Native Code Generation · React Native EvalsTasks/Mobile Development/React Native Code Generation
React Native Code Generation · benchmark dataset · 2025 · EN

Callstack Incubator React Native Evaluation Suite.

A benchmark suite evaluating how AI coding models handle authentic React Native development tasks. 71 evals across 5 categories: animation (14), async-state management (14), lists (19), navigation (14), and React Native APIs (10). Each eval specifies explicit, judgeable requirements. Model outputs are scored on requirement satisfaction using LLM-based judging. Covers real libraries: Reanimated, React Navigation, Zustand, Jotai, React Query, FlatList, FlashList, LegendList.

Paper Download datasetSubmit a result
§ 01 · Leaderboard

Best published scores.

40 results indexed across 4 metrics. Shaded row marks current SOTA; ties broken by submission date.


Primary
requirement-satisfaction · higher is better
All metrics
animation-satisfaction, async-state-satisfaction, navigation-satisfaction, requirement-satisfaction
animation-satisfaction
10 rows
#ModelOrgSubmittedPaper / codeanimation-satisfaction
01Composer 2AnysphereMar 2026Callstack Incubator94.30
02Claude Opus 4.6APIAnthropicMar 2026Callstack Incubator77.40
03GPT-5.4APIOpenAIMar 2026Callstack Incubator68.90
04GLM-5OSSZhipu AIMar 2026Callstack Incubator66
05Claude Sonnet 4.6APIAnthropicMar 2026Callstack Incubator65.10
06Gemini 3.1 ProAPIAnthropic/OpenAIMar 2026Callstack Incubator64.20
07GPT 5.3 CodexAPIOpenAIMar 2026Callstack Incubator63.20
08Grok 4APIxAIMar 2026Callstack Incubator59.40
09Kimi K2.5APIMoonshot AIMar 2026Callstack Incubator59.40
10DeepSeek-V3.2APIDeepSeekMar 2026Callstack Incubator56.40
async-state-satisfaction
10 rows
#ModelOrgSubmittedPaper / codeasync-state-satisfaction
01Composer 2AnysphereMar 2026Callstack Incubator98.50
02GPT-5.4APIOpenAIMar 2026Callstack Incubator85.40
03GPT 5.3 CodexAPIOpenAIMar 2026Callstack Incubator85.30
04Claude Opus 4.6APIAnthropicMar 2026Callstack Incubator84.60
05Claude Sonnet 4.6APIAnthropicMar 2026Callstack Incubator80.80
06Gemini 3.1 ProAPIAnthropic/OpenAIMar 2026Callstack Incubator80.80
07Kimi K2.5APIMoonshot AIMar 2026Callstack Incubator77.70
08DeepSeek-V3.2APIDeepSeekMar 2026Callstack Incubator77.70
09Grok 4APIxAIMar 2026Callstack Incubator73.80
10GLM-5OSSZhipu AIMar 2026Callstack Incubator73.80
navigation-satisfaction
10 rows
#ModelOrgSubmittedPaper / codenavigation-satisfaction
01Composer 2AnysphereMar 2026Callstack Incubator98.90
02GPT 5.3 CodexAPIOpenAIMar 2026Callstack Incubator95.60
03GPT-5.4APIOpenAIMar 2026Callstack Incubator95.60
04Gemini 3.1 ProAPIAnthropic/OpenAIMar 2026Callstack Incubator94.40
05Claude Sonnet 4.6APIAnthropicMar 2026Callstack Incubator93.30
06Claude Opus 4.6APIAnthropicMar 2026Callstack Incubator93.30
07Kimi K2.5APIMoonshot AIMar 2026Callstack Incubator93.30
08GLM-5OSSZhipu AIMar 2026Callstack Incubator86.70
09Grok 4APIxAIMar 2026Callstack Incubator84.40
10DeepSeek-V3.2APIDeepSeekMar 2026Callstack Incubator75.70
requirement-satisfaction· primary
10 rows
#ModelOrgSubmittedPaper / coderequirement-satisfaction
01Composer 2AnysphereMar 2026Callstack Incubator96.20
02Claude Opus 4.6APIAnthropicMar 2026Callstack Incubator84.36
03GPT-5.4APIOpenAIMar 2026Callstack Incubator82.64
04GPT 5.3 CodexAPIOpenAIMar 2026Callstack Incubator80.88
05Gemini 3.1 ProAPIAnthropic/OpenAIMar 2026Callstack Incubator78.90
06Claude Sonnet 4.6APIAnthropicMar 2026Callstack Incubator77.91
07Kimi K2.5APIMoonshot AIMar 2026Callstack Incubator74.91
08GLM-5OSSZhipu AIMar 2026Callstack Incubator74.23
09Grok 4APIxAIMar 2026Callstack Incubator70.06
10DeepSeek-V3.2APIDeepSeekMar 2026Callstack Incubator68.98
Fig 2 · Rows sorted by score within each metric. Shaded row marks SOTA. Dates reflect model or paper release where available, otherwise the date Codesota accessed the source.
§ 03 · Progress

1 steps
of state of the art.

Each row below marks a model that broke the previous record on requirement-satisfaction. Intermediate submissions are kept in the leaderboard above; only SOTA-setting entries are re-listed here.

Higher scores win. Each subsequent entry improved upon the previous best.

SOTA line · requirement-satisfaction
  1. Mar 24, 2026Composer 2Anysphere96.20
Fig 3 · SOTA-setting models only. 1 entries span Mar 2026 Mar 2026.
§ 06 · Contribute

Have a score that beats
this table?

Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.

Submit a result Read submission guide
What a submission needs
  • 01A public checkpoint or API endpoint
  • 02A reproduction script with frozen commit + seed
  • 03Declared evaluation environment (Python, deps)
  • 04One row per metric declared by this dataset
  • 05A contact so we can follow up on discrepancies