Evaluating AI models on generating correct, production-quality React Native implementations. Covers animation, navigation, state management, lists, and platform APIs using real-world libraries (Reanimated, React Navigation, Zustand, FlashList).
A benchmark suite evaluating how AI coding models handle authentic React Native development tasks. 71 evals across 5 categories: animation (14), async-state management (14), lists (19), navigation (14), and React Native APIs (10). Each eval specifies explicit, judgeable requirements. Model outputs are scored on requirement satisfaction using LLM-based judging. Covers real libraries: Reanimated, React Navigation, Zustand, Jotai, React Query, FlatList, FlashList, LegendList.
Leading models on React Native Evals.
| # | Model | navigation-satisfaction | Year | Source |
|---|---|---|---|---|
| ★ | Composer 2✓ | 98.9 | 2026 | paper ↗ |
| 2 | Composer 2✓ | 98.5 | 2026 | paper ↗ |
| 3 | Composer 2✓ | 96.2 | 2026 | paper ↗ |
| 4 | GPT 5.3 Codex✓ | 95.6 | 2026 | paper ↗ |
| 5 | GPT-5.4✓ | 95.6 | 2026 | paper ↗ |
| 6 | Gemini 3.1 Pro✓ | 94.4 | 2026 | paper ↗ |
| 7 | Composer 2✓ | 94.3 | 2026 | paper ↗ |
| 8 | Claude Opus 4.6✓ | 93.3 | 2026 | paper ↗ |
| 9 | Claude Sonnet 4.6✓ | 93.3 | 2026 | paper ↗ |
| 10 | Kimi K2.5✓ | 93.3 | 2026 | paper ↗ |
Didn't find the model, metric, or dataset you needed? Tell us in one line. We read every message and reply within 48 hours.
1 dataset tracked for this task.
Still looking for something on React Native Code Generation? A missing model, a stale score, a benchmark we should cover — drop it here and we'll handle it.
Real humans read every message. We track what people are asking for and prioritize accordingly.