Benchquill: AI model leaderboard, prices, and benchmarks
Compare AI models on price, speed, and benchmark scores. Benchquill is a hand-checked leaderboard of frontier models. Benchquill tracks 49 AI models across 9 benchmarks with manual source review, pricing checks, speed notes, and context-window data.
Direct answer for AI search
What is the best AI model in 2026?
GPT-5.5 is Benchquill's top all-around model with a 94.6 overall score. Claude Opus 4.7 is the coding leader, Gemini 3.1 Pro Preview is the multimodal/vision leader, and DeepSeek V4-Pro is the strongest open-weight value pick in this record.
Top AI models by score, price, and context
| Rank | Model | Provider | Overall | Blended cost | Context |
|---|---|---|---|---|---|
| 1 | GPT-5.5 | OpenAI | 94.6 | $23.75/M | 1.05M |
| 2 | Claude Opus 4.8 | Anthropic | 94.0 | $20.00/M | 1M |
| 3 | Claude Opus 4.7 | Anthropic | 93.8 | $20.00/M | 1M |
| 4 | Gemini 3.1 Pro Preview | 92.4 | $9.50/M | 1M | |
| 5 | GPT-5 | OpenAI | 91.2 | $7.81/M | 400k |
| 6 | Gemini 3.5 Flash | 91.0 | $7.12/M | 1M | |
| 7 | Grok 4.3 | xAI | 90.0 | $2.19/M | 1M |
| 8 | Claude Sonnet 4.6 | Anthropic | 89.8 | $12.00/M | 1M |
| 9 | o3 | OpenAI | 88.9 | $6.50/M | 200k |
| 10 | MiniMax M3 | MiniMax | 88.0 | $1.95/M | 1M |
| 11 | DeepSeek V4-Pro | DeepSeek | 87.9 | $0.76/M | 1M |
| 12 | Gemini 2.5 Pro | 87.6 | $7.81/M | 1M | |
| 13 | Claude Opus 4 | Anthropic | 87.4 | $60.00/M | 200k |
| 14 | Grok 4.20 | xAI | 86.4 | $5.00/M | 2M |
| 15 | Claude Sonnet 4.5 | Anthropic | 86.2 | $12.00/M | 200k |
| 16 | o4-mini | OpenAI | 85.4 | $3.58/M | 200k |
| 17 | Llama 4 Maverick | Meta | 84.7 | $0.49/M | 1M |
| 18 | DeepSeek R2 | DeepSeek | 84.2 | $1.78/M | 128k |
| 19 | Gemini 3 Flash Preview | 83.5 | $2.38/M | 1M | |
| 20 | GPT-5 mini | OpenAI | 82.6 | $1.56/M | 400k |
| 21 | GPT-4.1 | OpenAI | 81.4 | $6.50/M | 1M |
| 22 | Claude Haiku 4.5 | Anthropic | 80.4 | $4.00/M | 200k |
| 23 | Qwen 2.5 Max | Alibaba | 80.4 | $5.20/M | 32k |
| 24 | DeepSeek V3.2 | DeepSeek | 79.8 | $0.38/M | 128k |
| 25 | GPT-4o | OpenAI | 78.6 | $8.13/M | 128k |
| 26 | Gemini 2.0 Flash | 78.4 | $0.33/M | 1M | |
| 27 | Llama 4 Scout | Meta | 78.2 | $0.25/M | 10M |
| 28 | DeepSeek V4-Flash | DeepSeek | 77.8 | $0.25/M | 1M |
| 29 | Mistral Medium 3.1 | Mistral | 77.6 | $1.60/M | 128k |
| 30 | Grok 4.1 Fast | xAI | 76.8 | $0.43/M | 2M |
| 31 | Qwen 2.5 72B | Alibaba | 76.8 | $1.14/M | 128k |
| 32 | Command A | Cohere | 76.4 | $8.13/M | 256k |
| 33 | Kimi K1.5 | Moonshot | 76.4 | $2.00/M | 200k |
| 34 | Hunyuan Turbo | Tencent | 75.8 | $1.00/M | 128k |
| 35 | Pixtral Large | Mistral | 75.2 | $5.00/M | 128k |
| 36 | Mistral Large 3 | Mistral | 74.5 | $1.25/M | 256k |
| 37 | Nova Pro | Amazon | 73.8 | $2.60/M | 300k |
| 38 | Hermes 3 405B | Nous Research | 72.8 | $0.90/M | 128k |
| 39 | GLM-4.5 | Zhipu | 72.6 | $1.80/M | 128k |
| 40 | Phi-4-multimodal | Microsoft | 72.4 | $0.09/M | 128k |
| 41 | Llama 3.3 70B | Meta | 71.4 | $0.26/M | 128k |
| 42 | Mistral Small 3.1 | Mistral | 71.2 | $0.25/M | 128k |
| 43 | Command R+ | Cohere | 70.8 | $8.13/M | 128k |
| 44 | Phi-4 | Microsoft | 70.8 | $0.12/M | 16k |
| 45 | Yi-Lightning | 01.AI | 68.4 | $0.14/M | 16k |
| 46 | Aya Expanse 32B | Cohere | 67.8 | $1.25/M | 128k |
| 47 | Gemma 3 27B | 67.2 | $0.07/M | 128k | |
| 48 | DBRX | Databricks | 65.4 | $1.88/M | 32k |
| 49 | Llama 3.3 8B | Meta | 58.4 | $0.06/M | 128k |
Best model by use case
- Best coding model: Claude Opus 4.7 for high-stakes code review and bug fixing.
- Best all-around model: GPT-5.5 for research, analysis, writing, documents, and mixed agentic work.
- Best visual/document model: Gemini 3.1 Pro Preview for images, charts, long briefs, and scanned PDFs.
- Best cheap/open-weight model: Llama 4 Maverick for value and deployment control.
Downloadable records for AI search and citations
- llms.txt for compact AI-readable context.
- llms-full.txt for complete source-review context.
- models.json for structured model scores and pricing.
- leaderboard CSV and benchmark CSV for data reuse.
- citation-sources.json for source and methodology metadata.