Codesota · Benchmark · React Native EvalsHome/Leaderboards/React Native Evals
Unknown

React Native Evals.

A benchmark suite evaluating how AI coding models handle authentic React Native development tasks. 71 evals across 5 categories: animation (14), async-state management (14), lists (19), navigation (14), and React Native APIs (10). Each eval specifies explicit, judgeable requirements. Model outputs are scored on requirement satisfaction using LLM-based judging. Covers real libraries: Reanimated, React Navigation, Zustand, Jotai, React Query, FlatList, FlashList, LegendList.

Paper Leaderboard
§ 01 · Leaderboard

Results by metric.

Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

Navigation Satisfaction

Navigation Satisfaction is the reported evaluation metric for React Native Evals. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Navigation Satisfactionverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01Composer 2
v0.2.0 run, 10x per model, LLM-judged. Integrated tool (no API cost)
verified98.92026Source ↗Looks wrong?
02GPT 5.3 Codex
v0.2.0 run, 10x per model, LLM-judged. Cost: $19.37/run, tokens: 488K
verified95.62026Source ↗Looks wrong?
03GPT-5.4
v0.2.0 run, 10x per model, LLM-judged. Cost: $20.44/run, tokens: 547K
verified95.62026Source ↗Looks wrong?
04Gemini-3.1-Pro
v0.2.0 run, 10x per model, LLM-judged. Cost: $32.5/run, tokens: 668K
verified94.42026Source ↗Looks wrong?
05Claude Sonnet 4.6
v0.2.0 run, 10x per model, LLM-judged. Cost: $22.41/run, tokens: 531K
verified93.32026Source ↗Looks wrong?
06Claude Opus 4.6
v0.2.0 run, 10x per model, LLM-judged. Cost: $38.8/run, tokens: 532K
verified93.32026Source ↗Looks wrong?
07Kimi K2.5
v0.2.0 run, 10x per model, LLM-judged. Cost: $12.2/run, tokens: 1.68M
verified93.32026Source ↗Looks wrong?
08GLM-5
v0.2.0 run, 10x per model, LLM-judged. Cost: $10.1/run, tokens: 812K
verified86.72026Source ↗Looks wrong?
09Grok 4
v0.2.0 run, 10x per model, LLM-judged. Cost: $63.05/run, tokens: 838K
verified84.42026Source ↗Looks wrong?
10DeepSeek-V3.2
v0.2.0 run, 10x per model, LLM-judged. Cost: $13.5/run, tokens: 5.13M
verified75.72026Source ↗Looks wrong?

Async State Satisfaction

Async State Satisfaction is the reported evaluation metric for React Native Evals. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Async State Satisfactionverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01Composer 2
v0.2.0 run, 10x per model, LLM-judged. Integrated tool (no API cost)
verified98.52026Source ↗Looks wrong?
02GPT-5.4
v0.2.0 run, 10x per model, LLM-judged. Cost: $20.44/run, tokens: 547K
verified85.42026Source ↗Looks wrong?
03GPT 5.3 Codex
v0.2.0 run, 10x per model, LLM-judged. Cost: $19.37/run, tokens: 488K
verified85.32026Source ↗Looks wrong?
04Claude Opus 4.6
v0.2.0 run, 10x per model, LLM-judged. Cost: $38.8/run, tokens: 532K
verified84.62026Source ↗Looks wrong?
05Claude Sonnet 4.6
v0.2.0 run, 10x per model, LLM-judged. Cost: $22.41/run, tokens: 531K
verified80.82026Source ↗Looks wrong?
06Gemini-3.1-Pro
v0.2.0 run, 10x per model, LLM-judged. Cost: $32.5/run, tokens: 668K
verified80.82026Source ↗Looks wrong?
07DeepSeek-V3.2
v0.2.0 run, 10x per model, LLM-judged. Cost: $13.5/run, tokens: 5.13M
verified77.72026Source ↗Looks wrong?
08Kimi K2.5
v0.2.0 run, 10x per model, LLM-judged. Cost: $12.2/run, tokens: 1.68M
verified77.72026Source ↗Looks wrong?
09GLM-5
v0.2.0 run, 10x per model, LLM-judged. Cost: $10.1/run, tokens: 812K
verified73.82026Source ↗Looks wrong?
10Grok 4
v0.2.0 run, 10x per model, LLM-judged. Cost: $63.05/run, tokens: 838K
verified73.82026Source ↗Looks wrong?

Requirement Satisfaction

Requirement Satisfaction is the reported evaluation metric for React Native Evals. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Requirement Satisfactionverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01Composer 2
v0.2.0 run, 10x per model, LLM-judged. Integrated tool (no API cost)
verified96.22026Source ↗Looks wrong?
02Claude Opus 4.6
v0.2.0 run, 10x per model, LLM-judged. Cost: $38.8/run, tokens: 532K
verified84.362026Source ↗Looks wrong?
03GPT-5.4
v0.2.0 run, 10x per model, LLM-judged. Cost: $20.44/run, tokens: 547K
verified82.642026Source ↗Looks wrong?
04GPT 5.3 Codex
v0.2.0 run, 10x per model, LLM-judged. Cost: $19.37/run, tokens: 488K
verified80.882026Source ↗Looks wrong?
05Gemini-3.1-Pro
v0.2.0 run, 10x per model, LLM-judged. Cost: $32.5/run, tokens: 668K
verified78.92026Source ↗Looks wrong?
06Claude Sonnet 4.6
v0.2.0 run, 10x per model, LLM-judged. Cost: $22.41/run, tokens: 531K
verified77.912026Source ↗Looks wrong?
07Kimi K2.5
v0.2.0 run, 10x per model, LLM-judged. Cost: $12.2/run, tokens: 1.68M
verified74.912026Source ↗Looks wrong?
08GLM-5
v0.2.0 run, 10x per model, LLM-judged. Cost: $10.1/run, tokens: 812K
verified74.232026Source ↗Looks wrong?
09Grok 4
v0.2.0 run, 10x per model, LLM-judged. Cost: $63.05/run, tokens: 838K
verified70.062026Source ↗Looks wrong?
10DeepSeek-V3.2
v0.2.0 run, 10x per model, LLM-judged. Cost: $13.5/run, tokens: 5.13M
verified68.982026Source ↗Looks wrong?

Animation Satisfaction

Animation Satisfaction is the reported evaluation metric for React Native Evals. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Animation Satisfactionverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01Composer 2
v0.2.0 run, 10x per model, LLM-judged. Integrated tool (no API cost)
verified94.32026Source ↗Looks wrong?
02Claude Opus 4.6
v0.2.0 run, 10x per model, LLM-judged. Cost: $38.8/run, tokens: 532K
verified77.42026Source ↗Looks wrong?
03GPT-5.4
v0.2.0 run, 10x per model, LLM-judged. Cost: $20.44/run, tokens: 547K
verified68.92026Source ↗Looks wrong?
04GLM-5
v0.2.0 run, 10x per model, LLM-judged. Cost: $10.1/run, tokens: 812K
verified662026Source ↗Looks wrong?
05Claude Sonnet 4.6
v0.2.0 run, 10x per model, LLM-judged. Cost: $22.41/run, tokens: 531K
verified65.12026Source ↗Looks wrong?
06Gemini-3.1-Pro
v0.2.0 run, 10x per model, LLM-judged. Cost: $32.5/run, tokens: 668K
verified64.22026Source ↗Looks wrong?
07GPT 5.3 Codex
v0.2.0 run, 10x per model, LLM-judged. Cost: $19.37/run, tokens: 488K
verified63.22026Source ↗Looks wrong?
08Grok 4
v0.2.0 run, 10x per model, LLM-judged. Cost: $63.05/run, tokens: 838K
verified59.42026Source ↗Looks wrong?
09Kimi K2.5
v0.2.0 run, 10x per model, LLM-judged. Cost: $12.2/run, tokens: 1.68M
verified59.42026Source ↗Looks wrong?
10DeepSeek-V3.2
v0.2.0 run, 10x per model, LLM-judged. Cost: $13.5/run, tokens: 5.13M
verified56.42026Source ↗Looks wrong?
§ 04 · Submit a result

Add to the leaderboard.

← Back to Leaderboards