Codesota · Models1,357 models indexed · 20 match filter
Editorial · Models
Every model, measured.
Start with a research area, drill into a vendor, or page through the full index. Only models with at least one benchmark score appear — a model without a recorded score can’t be ranked.
Vendor:Areas overviewspeakleash · 253OpenAI · 85Google · 71Qwen · 52Alibaba · 47Anthropic · 44Microsoft · 35Meta · 30Mistral · 30DeepSeek · 28google · 19meta-llama · 19mistralai · 19Meta AI · 15CYFRAGOVPL · 14Zhipu AI · 13NVIDIA · 10SpeakLeash · 10internlm · 10xAI · 10ByteDance · 9Baidu · 8PLLuM · 8ibm-granite · 8microsoft · 8Amazon · 7Google DeepMind · 7MiniMax · 7Mistral AI · 7Remek · 7Shanghai AI Lab · 7allenai · 7utter-project · 7CohereForAI · 6Microsoft Research · 6Salesforce · 601-ai · 5Alibaba Cloud · 5Cohere · 5Moonshot AI · 5NousResearch · 5THUML · 5deepseek-ai · 5DeepMind · 4Facebook AI · 4IBM · 4Meituan · 4Stanford · 4THUDM · 4UC San Diego · 4VikParuchuri · 4gguf-iq · 4nvidia · 4openchat · 4tiiuae · 4Allen AI · 3BAAI · 3Du et al. · 3ForgeCode · 3Fudan University · 3IDEA Research · 3Liao et al. · 3Moonshot.AI · 3Nam Tuan Ly / NII · 3OPI-PG · 3OpenDataLab · 3ViCoS Lab Ljubljana · 3Xiaomi · 3Zhao et al. · 3gguf · 3gguf11bv30 · 3gguf7bv30 · 3upstage · 3+ 247 smaller vendors (291 models)
§ 01 · Reinforcement Learning models
20 models in Reinforcement Learning · page 1 of 1.
| # | Model | Vendor | Parameters | Architecture | SOTA | Benchmarks | Results |
|---|---|---|---|---|---|---|---|
| 001 | Go-Explore | Uber AI | — | Exploration RL | 1 | 1 | 1 |
| 002 | TD-MPC2 (317M params) | UC San Diego | — | — | 1 | 1 | 1 |
| 003 | DreamerV3 | Google DeepMind | — | World Model (Model-Based) | 2 | 2 | |
| 004 | Agent57 | DeepMind | — | Distributed RL (Recurrent + Episodic Memory) | 1 | 1 | |
| 005 | BBOS-1 | Unknown | — | Model-Based RL | 1 | 1 | |
| 006 | BRO | DeepMind / TU Warsaw | — | — | 1 | 1 | |
| 007 | DQN (Human-level) | DeepMind | — | Deep Q-Network (CNN) | 1 | 1 | |
| 008 | Disco57 | Google DeepMind | — | DiscoRL — meta-learned RL update rule (discovered by automated search) | 1 | 1 | |
| 009 | DrQ-v2 | NYU / Google | — | — | 1 | 1 | |
| 010 | FOWM | CMU | — | — | 1 | 1 | |
| 011 | GDI-H3 | Research | — | Model-Based RL | 1 | 1 | |
| 012 | Human Professional | Biology | — | Biological Neural Network | 1 | 1 | |
| 013 | LBC | Tsinghua University / Baidu | — | Learnable Behavior Control (distributed off-policy actor-critic) | 1 | 1 | |
| 014 | MEME | Google DeepMind | — | Memory-Based Exploration Agent (Agent57 variant) | 1 | 1 | |
| 015 | MuZero | DeepMind | — | Model-Based RL | 1 | 1 | |
| 016 | Rainbow DQN | DeepMind | — | DQN Variant | 1 | 1 | |
| 017 | SAC (state-based) | UC Berkeley | — | — | 1 | 1 | |
| 018 | TD-MPC | UC San Diego | — | — | 1 | 1 | |
| 019 | TD-MPC2 (19M params) | UC San Diego | — | — | 1 | 1 | |
| 020 | TD-MPC2 (5M params) | UC San Diego | — | — | 1 | 1 |