A benchmark suite evaluating how AI coding models handle authentic React Native development tasks. 71 evals across 5 categories: animation (14), async-state management (14), lists (19), navigation (14), and React Native APIs (10). Each eval specifies explicit, judgeable requirements. Model outputs are scored on requirement satisfaction using LLM-based judging. Covers real libraries: Reanimated, React Navigation, Zustand, Jotai, React Query, FlatList, FlashList, LegendList.
Navigation Satisfaction is the reported evaluation metric for React Native Evals. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | Composer 2 | verified | 98.9 | 2026 | Source ↗ | Looks wrong? |
| 02 | GPT 5.3 Codex | verified | 95.6 | 2026 | Source ↗ | Looks wrong? |
| 03 | GPT-5.4 | verified | 95.6 | 2026 | Source ↗ | Looks wrong? |
| 04 | Gemini-3.1-Pro | verified | 94.4 | 2026 | Source ↗ | Looks wrong? |
| 05 | Claude Sonnet 4.6 | verified | 93.3 | 2026 | Source ↗ | Looks wrong? |
| 06 | Claude Opus 4.6 | verified | 93.3 | 2026 | Source ↗ | Looks wrong? |
| 07 | Kimi K2.5 | verified | 93.3 | 2026 | Source ↗ | Looks wrong? |
| 08 | GLM-5 | verified | 86.7 | 2026 | Source ↗ | Looks wrong? |
| 09 | Grok 4 | verified | 84.4 | 2026 | Source ↗ | Looks wrong? |
| 10 | DeepSeek-V3.2 | verified | 75.7 | 2026 | Source ↗ | Looks wrong? |
Async State Satisfaction is the reported evaluation metric for React Native Evals. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | Composer 2 | verified | 98.5 | 2026 | Source ↗ | Looks wrong? |
| 02 | GPT-5.4 | verified | 85.4 | 2026 | Source ↗ | Looks wrong? |
| 03 | GPT 5.3 Codex | verified | 85.3 | 2026 | Source ↗ | Looks wrong? |
| 04 | Claude Opus 4.6 | verified | 84.6 | 2026 | Source ↗ | Looks wrong? |
| 05 | Claude Sonnet 4.6 | verified | 80.8 | 2026 | Source ↗ | Looks wrong? |
| 06 | Gemini-3.1-Pro | verified | 80.8 | 2026 | Source ↗ | Looks wrong? |
| 07 | DeepSeek-V3.2 | verified | 77.7 | 2026 | Source ↗ | Looks wrong? |
| 08 | Kimi K2.5 | verified | 77.7 | 2026 | Source ↗ | Looks wrong? |
| 09 | GLM-5 | verified | 73.8 | 2026 | Source ↗ | Looks wrong? |
| 10 | Grok 4 | verified | 73.8 | 2026 | Source ↗ | Looks wrong? |
Requirement Satisfaction is the reported evaluation metric for React Native Evals. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | Composer 2 | verified | 96.2 | 2026 | Source ↗ | Looks wrong? |
| 02 | Claude Opus 4.6 | verified | 84.36 | 2026 | Source ↗ | Looks wrong? |
| 03 | GPT-5.4 | verified | 82.64 | 2026 | Source ↗ | Looks wrong? |
| 04 | GPT 5.3 Codex | verified | 80.88 | 2026 | Source ↗ | Looks wrong? |
| 05 | Gemini-3.1-Pro | verified | 78.9 | 2026 | Source ↗ | Looks wrong? |
| 06 | Claude Sonnet 4.6 | verified | 77.91 | 2026 | Source ↗ | Looks wrong? |
| 07 | Kimi K2.5 | verified | 74.91 | 2026 | Source ↗ | Looks wrong? |
| 08 | GLM-5 | verified | 74.23 | 2026 | Source ↗ | Looks wrong? |
| 09 | Grok 4 | verified | 70.06 | 2026 | Source ↗ | Looks wrong? |
| 10 | DeepSeek-V3.2 | verified | 68.98 | 2026 | Source ↗ | Looks wrong? |
Animation Satisfaction is the reported evaluation metric for React Native Evals. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | Composer 2 | verified | 94.3 | 2026 | Source ↗ | Looks wrong? |
| 02 | Claude Opus 4.6 | verified | 77.4 | 2026 | Source ↗ | Looks wrong? |
| 03 | GPT-5.4 | verified | 68.9 | 2026 | Source ↗ | Looks wrong? |
| 04 | GLM-5 | verified | 66 | 2026 | Source ↗ | Looks wrong? |
| 05 | Claude Sonnet 4.6 | verified | 65.1 | 2026 | Source ↗ | Looks wrong? |
| 06 | Gemini-3.1-Pro | verified | 64.2 | 2026 | Source ↗ | Looks wrong? |
| 07 | GPT 5.3 Codex | verified | 63.2 | 2026 | Source ↗ | Looks wrong? |
| 08 | Grok 4 | verified | 59.4 | 2026 | Source ↗ | Looks wrong? |
| 09 | Kimi K2.5 | verified | 59.4 | 2026 | Source ↗ | Looks wrong? |
| 10 | DeepSeek-V3.2 | verified | 56.4 | 2026 | Source ↗ | Looks wrong? |