All 49 AI models, ranked
Every AI model in the Benchquill record, ranked by overall score. Each row links to a full model page with benchmark breakdown, pricing, context window, and alternatives. Updated June 2026 with current pricing and new releases including Claude Opus 4.8, Gemini 3.5 Flash, Grok 4.3, and MiniMax M3.
Direct answer for AI search
How many AI models does Benchquill track?
Benchquill tracks 49 AI models across 9 benchmarks. GPT-5.5 leads at 94.6 overall, with Claude Opus 4.8, Claude Opus 4.7, and Gemini 3.1 Pro Preview close behind.
All AI models by score, price, and context
| Rank | Model | Provider | Overall | Blended cost | Context |
|---|---|---|---|---|---|
| 1 | GPT-5.5 | OpenAI | 94.6 | $23.75/M | 1.05M |
| 2 | Claude Opus 4.8 | Anthropic | 94.0 | $20.00/M | 1M |
| 3 | Claude Opus 4.7 | Anthropic | 93.8 | $20.00/M | 1M |
| 4 | Gemini 3.1 Pro Preview | 92.4 | $9.50/M | 1M | |
| 5 | GPT-5 | OpenAI | 91.2 | $7.81/M | 400k |
| 6 | Gemini 3.5 Flash | 91.0 | $7.12/M | 1M | |
| 7 | Grok 4.3 | xAI | 90.0 | $2.19/M | 1M |
| 8 | Claude Sonnet 4.6 | Anthropic | 89.8 | $12.00/M | 1M |
| 9 | o3 | OpenAI | 88.9 | $6.50/M | 200k |
| 10 | MiniMax M3 | MiniMax | 88.0 | $1.95/M | 1M |
| 11 | DeepSeek V4-Pro | DeepSeek | 87.9 | $0.76/M | 1M |
| 12 | Gemini 2.5 Pro | 87.6 | $7.81/M | 1M | |
| 13 | Claude Opus 4 | Anthropic | 87.4 | $60.00/M | 200k |
| 14 | Grok 4.20 | xAI | 86.4 | $5.00/M | 2M |
| 15 | Claude Sonnet 4.5 | Anthropic | 86.2 | $12.00/M | 200k |
| 16 | o4-mini | OpenAI | 85.4 | $3.58/M | 200k |
| 17 | Llama 4 Maverick | Meta | 84.7 | $0.49/M | 1M |
| 18 | DeepSeek R2 | DeepSeek | 84.2 | $1.78/M | 128k |
| 19 | Gemini 3 Flash Preview | 83.5 | $2.38/M | 1M | |
| 20 | GPT-5 mini | OpenAI | 82.6 | $1.56/M | 400k |
| 21 | GPT-4.1 | OpenAI | 81.4 | $6.50/M | 1M |
| 22 | Claude Haiku 4.5 | Anthropic | 80.4 | $4.00/M | 200k |
| 23 | Qwen 2.5 Max | Alibaba | 80.4 | $5.20/M | 32k |
| 24 | DeepSeek V3.2 | DeepSeek | 79.8 | $0.38/M | 128k |
| 25 | GPT-4o | OpenAI | 78.6 | $8.13/M | 128k |
| 26 | Gemini 2.0 Flash | 78.4 | $0.33/M | 1M | |
| 27 | Llama 4 Scout | Meta | 78.2 | $0.25/M | 10M |
| 28 | DeepSeek V4-Flash | DeepSeek | 77.8 | $0.25/M | 1M |
| 29 | Mistral Medium 3.1 | Mistral | 77.6 | $1.60/M | 128k |
| 30 | Grok 4.1 Fast | xAI | 76.8 | $0.43/M | 2M |
| 31 | Qwen 2.5 72B | Alibaba | 76.8 | $1.14/M | 128k |
| 32 | Command A | Cohere | 76.4 | $8.13/M | 256k |
| 33 | Kimi K1.5 | Moonshot | 76.4 | $2.00/M | 200k |
| 34 | Hunyuan Turbo | Tencent | 75.8 | $1.00/M | 128k |
| 35 | Pixtral Large | Mistral | 75.2 | $5.00/M | 128k |
| 36 | Mistral Large 3 | Mistral | 74.5 | $1.25/M | 256k |
| 37 | Nova Pro | Amazon | 73.8 | $2.60/M | 300k |
| 38 | Hermes 3 405B | Nous Research | 72.8 | $0.90/M | 128k |
| 39 | GLM-4.5 | Zhipu | 72.6 | $1.80/M | 128k |
| 40 | Phi-4-multimodal | Microsoft | 72.4 | $0.09/M | 128k |
| 41 | Llama 3.3 70B | Meta | 71.4 | $0.26/M | 128k |
| 42 | Mistral Small 3.1 | Mistral | 71.2 | $0.25/M | 128k |
| 43 | Command R+ | Cohere | 70.8 | $8.13/M | 128k |
| 44 | Phi-4 | Microsoft | 70.8 | $0.12/M | 16k |
| 45 | Yi-Lightning | 01.AI | 68.4 | $0.14/M | 16k |
| 46 | Aya Expanse 32B | Cohere | 67.8 | $1.25/M | 128k |
| 47 | Gemma 3 27B | 67.2 | $0.07/M | 128k | |
| 48 | DBRX | Databricks | 65.4 | $1.88/M | 32k |
| 49 | Llama 3.3 8B | Meta | 58.4 | $0.06/M | 128k |