News

Technical deep-dives into trending AI models. Benchmark analysis, architecture breakdowns, practical recommendations.

Featured

Recent

GPT-5 Leads Aider Polyglot at 88% — Real-World Coding Benchmark

OpenAI's GPT-5 with high reasoning tops the Aider coding benchmark, followed by o3-pro (84.9%) and Gemini 2.5 Pro (83.1%). Claude Sonnet 4 disappoints at 61%.

Code

DeepSeek V3.2 Speciale: Open-Source Model at 89.6% LiveCodeBench

DeepSeek's latest open model closes the gap with proprietary leaders. V3.1-Think hits 66% SWE-bench. The open-source code generation frontier advances rapidly.

Open Source

Google Chirp 3 HD: Instant Voice Cloning in 31 Languages

8 distinct voice personalities, real-time streaming, and voice cloning from short samples. GA on Vertex AI. The TTS landscape is fragmenting between LLM-native and dedicated models.

Speech

Is SWE-bench Verified Contaminated? OpenAI Shifts to SWE-bench Pro

OpenAI stops reporting Verified scores, citing contamination concerns. Agent scaffolding inflates scores (81% with agents vs 69% standalone). The benchmark wars heat up.

Benchmarks

Kimi K2: Dark Horse Hits 94.5% HumanEval

Moonshot AI's Kimi K2 0905 quietly reaches second-best HumanEval score behind only Claude Opus 4.6. The Chinese AI lab arms race continues on coding benchmarks.

Code

Gemini 2.5 Pro TTS: LLM-Native Speech at 4.7 MOS

30 speakers, 80+ locales, prompt-controlled emotion and style. Google's bet: TTS should be a capability of the LLM, not a separate model. Flash variant optimized for real-time.

Speech

Tencent HY-MT1.5: Translation Model Beats Google by 15-65%

1.8B parameter model from WMT2025 winner achieves near Gemini-3.0-Pro performance while running on smartphones. Supports 33 languages.

NLP

LiquidAI LFM2-2.6B: Edge Model Beats 680B DeepSeek R1

2.6B dense model using pure RL surpasses models 263x larger on instruction-following. Hybrid convolution-attention enables phone deployment.

Edge AI

Wan2.2 Animate: First Open-Source MoE Video Model

Alibaba's 14B MoE model combines motion transfer and character animation. 720p at 24fps for ~$0.40 per 5s clip vs $2 for Veo 3.

Video

Z-Image-Turbo: FLUX-Quality Images on 16GB GPUs

Alibaba's 6B distilled model achieves near-FLUX quality in 8 steps on consumer hardware. Apache 2.0 license enables commercial use.

Image Gen

Stay ahead of model releases

We cover what matters: benchmarks, not press releases.

About our coverage

CodeSOTA tracks trending AI models from Hugging Face, arXiv, and major conferences. Our analysis focuses on verified benchmark results, not marketing claims.

Each article includes technical specifications, benchmark comparisons, deployment requirements, and practical recommendations for different use cases.