React Native Code Generation2025en
Callstack Incubator React Native Evaluation Suite
A benchmark suite evaluating how AI coding models handle authentic React Native development tasks. 71 evals across 5 categories: animation (14), async-state management (14), lists (19), navigation (14), and React Native APIs (10). Each eval specifies explicit, judgeable requirements. Model outputs are scored on requirement satisfaction using LLM-based judging. Covers real libraries: Reanimated, React Navigation, Zustand, Jotai, React Query, FlatList, FlashList, LegendList.
Current State of the Art
Composer 2
Anysphere
96.2
requirement-satisfaction
Top Models Performance Comparison
Top 10 models ranked by requirement-satisfaction
Best Score
96.2
Top Model
Composer 2
Models Compared
10
Score Range
27.2
animation-satisfaction
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | Composer 2 Anysphere | 94.3 | Mar 2026 | |
| 2 | Claude Opus 4.6API Anthropic | 77.4 | Mar 2026 | |
| 3 | GPT 5.4API OpenAI | 68.9 | Mar 2026 | |
| 4 | GLM 5API Zhipu AI | 66 | Mar 2026 | |
| 5 | Claude Sonnet 4.6API Anthropic | 65.1 | Mar 2026 | |
| 6 | Gemini 3.1 Pro PreviewAPI Google | 64.2 | Mar 2026 | |
| 7 | GPT 5.3 CodexAPI OpenAI | 63.2 | Mar 2026 | |
| 8 | Grok 4API xAI | 59.4 | Mar 2026 | |
| 9 | Kimi K2.5API Moonshot | 59.4 | Mar 2026 | |
| 10 | DeepSeek V3.2API DeepSeek | 56.4 | Mar 2026 |
async-state-satisfaction
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | Composer 2 Anysphere | 98.5 | Mar 2026 | |
| 2 | GPT 5.4API OpenAI | 85.4 | Mar 2026 | |
| 3 | GPT 5.3 CodexAPI OpenAI | 85.3 | Mar 2026 | |
| 4 | Claude Opus 4.6API Anthropic | 84.6 | Mar 2026 | |
| 5 | Gemini 3.1 Pro PreviewAPI Google | 80.8 | Mar 2026 | |
| 6 | Claude Sonnet 4.6API Anthropic | 80.8 | Mar 2026 | |
| 7 | DeepSeek V3.2API DeepSeek | 77.7 | Mar 2026 | |
| 8 | Kimi K2.5API Moonshot | 77.7 | Mar 2026 | |
| 9 | Grok 4API xAI | 73.8 | Mar 2026 | |
| 10 | GLM 5API Zhipu AI | 73.8 | Mar 2026 |
navigation-satisfaction
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | Composer 2 Anysphere | 98.9 | Mar 2026 | |
| 2 | GPT 5.4API OpenAI | 95.6 | Mar 2026 | |
| 3 | GPT 5.3 CodexAPI OpenAI | 95.6 | Mar 2026 | |
| 4 | Gemini 3.1 Pro PreviewAPI Google | 94.4 | Mar 2026 | |
| 5 | Claude Opus 4.6API Anthropic | 93.3 | Mar 2026 | |
| 6 | Claude Sonnet 4.6API Anthropic | 93.3 | Mar 2026 | |
| 7 | Kimi K2.5API Moonshot | 93.3 | Mar 2026 | |
| 8 | GLM 5API Zhipu AI | 86.7 | Mar 2026 | |
| 9 | Grok 4API xAI | 84.4 | Mar 2026 | |
| 10 | DeepSeek V3.2API DeepSeek | 75.7 | Mar 2026 |
requirement-satisfactionPrimary
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | Composer 2 Anysphere | 96.2 | Mar 2026 | |
| 2 | Claude Opus 4.6API Anthropic | 84.36 | Mar 2026 | |
| 3 | GPT 5.4API OpenAI | 82.64 | Mar 2026 | |
| 4 | GPT 5.3 CodexAPI OpenAI | 80.88 | Mar 2026 | |
| 5 | Gemini 3.1 Pro PreviewAPI Google | 78.9 | Mar 2026 | |
| 6 | Claude Sonnet 4.6API Anthropic | 77.91 | Mar 2026 | |
| 7 | Kimi K2.5API Moonshot | 74.91 | Mar 2026 | |
| 8 | GLM 5API Zhipu AI | 74.23 | Mar 2026 | |
| 9 | Grok 4API xAI | 70.06 | Mar 2026 | |
| 10 | DeepSeek V3.2API DeepSeek | 68.98 | Mar 2026 |