# Benchquill

> Hand-checked AI model leaderboard, benchmark notes, and price comparisons. Independent editorial team. No live scrapers; provider prices and source-backed benchmark rows are cross-referenced, while Benchquill composite scores are labeled as editorial composites.

Benchquill is a manually curated record of frontier AI models. We track 45 models across 9 benchmarks, with prices, speed, and context window data sourced from provider docs and trusted public leaderboards. Last verified Apr 29, 2026 by the Benchquill editorial team.

## Key resources

- [Leaderboard](https://benchquill.com/leaderboard): All 45 AI models ranked on overall score, coding, reasoning, math, vision, speed, and price
- [Compare](https://benchquill.com/compare): Side-by-side comparison of any two to four AI models with use-case-aware scoring
- [Best AI model buyer guides](https://benchquill.com/best-ai-models): Detailed use-case guides for coding, students, budget work, content writing, research, and business
- [Industries](https://benchquill.com/industries): Best AI model picks for 25 industries including Healthcare, Legal, Finance, Software, and Education
- [Benchmarks](https://benchquill.com/benchmarks): 9 benchmarks covered with methodology notes, including SWE-Bench Verified, HumanEval, GPQA Diamond, MATH-500, and MMMU
- [Providers](https://benchquill.com/providers): Deep dives on every AI lab including OpenAI, Anthropic, Google, Amazon, Meta, DeepSeek, Mistral, Alibaba, xAI, Cohere, and Microsoft
- [Blog](https://benchquill.com/blog): Long-form analysis, comparisons, buyer guides, and how-to pieces
- [Methodology](https://benchquill.com/methodology): Full sourcing process and editorial principles
- [Data exports](https://benchquill.com/models.json): Crawlable model dataset in JSON, plus [leaderboard CSV](https://benchquill.com/benchquill-leaderboard.csv), [benchmark CSV](https://benchquill.com/benchquill-benchmarks.csv), and [citation-source JSON](https://benchquill.com/citation-sources.json)

## Featured analysis

- [Best AI model for coding in 2026](https://benchquill.com/best-ai-model-for-coding): Claude Opus 4.7 is the final-review coding pick; GPT-5.5 is the everyday engineering default; DeepSeek V4-Pro and GPT-5 mini cover budget routes.
- [Best AI model for students in 2026](https://benchquill.com/best-ai-model-for-students): GPT-5.5 is the default study tutor; Gemini 3.1 Pro is better for diagrams and scanned notes; GPT-5 mini/Gemini 3 Flash Preview reduce repeated study cost.
- [Best cheap AI model in 2026](https://benchquill.com/best-cheap-ai-model): Llama 4 Maverick is the all-round value pick; DeepSeek V4-Flash is the low-cost DeepSeek API route; Gemini 3 Flash Preview is the fast chat pick.
- [Best AI model for content writing in 2026](https://benchquill.com/best-ai-model-for-content-writing): Gemini 3.1 Pro leads visual and long-brief content; GPT-5.5 leads strategy and evidence synthesis; Claude Sonnet 4.6 is for editorial polish.
- [Best AI model for research in 2026](https://benchquill.com/best-ai-model-for-research): GPT-5.5 is the all-round research default; Claude Opus 4.7 is careful synthesis; Gemini 3.1 Pro handles figures and PDFs.
- [Best AI model for business in 2026](https://benchquill.com/best-ai-model-for-business): GPT-5.5 is the mixed-task business default; Claude Opus 4.7 is high-stakes review; Gemini 3.1 Pro is visual documents and decks.
- [April 2026 AI news briefing](https://benchquill.com/post/ai-news-april-2026-model-updates-to-watch): GPT-5.5, ChatGPT Images 2.0 / GPT-image-2, Claude Opus 4.7, Project Glasswing / Claude Mythos Preview, Gemini 3.1 Pro, Deep Research Max, DeepSeek V4, NVIDIA Nemotron Coalition, Stanford AI Index 2026, and EU AI Act enforcement timeline.
- [AI race insight](https://benchquill.com/insight/ai-race-us-china-gap-stanford-ai-index-2026): Stanford AI Index 2026 says the US-China model performance gap has effectively closed; buyers should compare cost, governance, latency, data residency, and compliance fit.
- [GPT-5.5 vs Claude vs Gemini 3.1 Pro](https://benchquill.com/post/gpt-5-5-vs-claude-opus-4-7-vs-gemini-3-pro-2026)
- [Cheapest AI models in 2026](https://benchquill.com/post/cheapest-ai-models-2026-budget-guide)
- [Open vs closed AI models: 2026 guide](https://benchquill.com/post/open-weight-vs-closed-ai-models-which-to-pick)
- [Best AI coding models in 2026](https://benchquill.com/post/best-ai-coding-models-2026-state-of-the-market)
- [AI model speed: tokens per second](https://benchquill.com/post/ai-model-speed-tokens-per-second-explained)

## Methodology

Every ranked score on Benchquill is a human-reviewed editorial composite unless a benchmark row names a raw source. Provider pricing, context windows, release notes, and selected benchmark notes are checked against official docs, model cards, benchmark maintainers, and trusted third-party leaderboards. There is no scraper. Full process at https://benchquill.com/methodology.

We do not accept money to feature, reorder, or hide any model. Industry guides reflect actual model strength based on published benchmarks, not commercial partnerships.

## Source checks used for buyer guides

- OpenAI API pricing and model docs: https://developers.openai.com/api/docs/models/gpt-5.5 (GPT-5.5 pricing verified at $5 input / $30 output per 1M tokens with a 1.05M context window)
- OpenAI GPT-5.5 release: https://openai.com/index/introducing-gpt-5-5/ (announced Apr 23, 2026; Apr 24 update says GPT-5.5 and GPT-5.5 Pro are available in the API)
- OpenAI GPT-5 nano model docs: https://developers.openai.com/api/docs/models/gpt-5-nano (source-checked pricing-only model; excluded from ranked table until comparable benchmark evidence is available)
- OpenAI ChatGPT Images 2.0: https://openai.com/index/introducing-chatgpt-images-2-0/ (image generation update, Apr 21, 2026)
- OpenAI ChatGPT Images 2.0 system card: https://deploymentsafety.openai.com/chatgpt-images-2-0/ (image-generation safety, provenance, and deepfake risk notes)
- Anthropic Claude Opus 4.7: https://www.anthropic.com/claude/opus (1M context, API availability, $5 input / $25 output per 1M tokens)
- Anthropic Project Glasswing: https://www.anthropic.com/glasswing (defensive AI initiative for critical software)
- Anthropic Claude Mythos Preview cyber analysis: https://red.anthropic.com/2026/mythos-preview/ (cyber capability assessment, Apr 7, 2026)
- Google Gemini API pricing: https://ai.google.dev/gemini-api/docs/pricing (Gemini 3.1 Pro Preview at $2/$12 below 200k prompt tokens and $4/$18 above 200k; Gemini 3 Flash Preview at $0.50/$3)
- Google Gemini 3 guide: https://ai.google.dev/gemini-api/docs/gemini-3 (Gemini 3 series preview status, 1M input context, multimodal reasoning guidance)
- Google Deep Research Max: https://blog.google/innovation-and-ai/models-and-research/gemini-models/next-generation-gemini-deep-research/ (MCP support, native charts, files, search, code execution, custom data, Apr 21, 2026)
- DeepSeek V4 pricing: https://api-docs.deepseek.com/quick_start/pricing (V4-Pro base $1.74/$3.48 with a 75% discount through May 31, 2026, making the promo price $0.435/$0.87; V4-Flash is $0.14/$0.28)
- DeepSeek V4 preview release: https://api-docs.deepseek.com/news/news260424 (open weights, V4-Pro/V4-Flash, 1M context, OpenAI/Anthropic API compatibility, Apr 24, 2026)
- Amazon Nova Pro model card: https://docs.aws.amazon.com/bedrock/latest/userguide/model-card-amazon-nova-pro.html (Nova Pro is an Amazon Bedrock model with 300K context and multimodal input)
- xAI Grok models: https://docs.x.ai/developers/models (Grok 4.20 is the recommended general API model; Benchquill uses the $2/$6 standard pricing basis and advises checking xAI Console pricing before quoting)
- Mistral Large 3 model card: https://docs.mistral.ai/models/model-cards/mistral-large-3-25-12 (open-weight multimodal model, 256K context, $0.50/$1.50 pricing)
- Stanford AI Index 2026: https://hai.stanford.edu/ai-index/2026-ai-index-report (US-China model performance gap effectively closed)
- NVIDIA Nemotron Coalition: https://nvidianews.nvidia.com/news/nvidia-launches-nemotron-coalition-of-leading-global-ai-labs-to-advance-open-frontier-models (open frontier model collaboration, Mar 16, 2026)
- EU AI Act framework: https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai (AI Act timeline, GPAI rules, transparency and high-risk obligations)

## Selected frontier and value models tracked (April 2026)

- GPT-5.5 (OpenAI, released Apr 23, 2026): 94.6 Benchquill composite overall, $5/$30 per million tokens, 1.05M API context
- Claude Opus 4.7 (Anthropic, released Apr 16, 2026): 93.8 overall, $5/$25 per million tokens, 1M context
- Gemini 3.1 Pro Preview (Google, current Gemini Pro preview): 92.4 overall, $2/$12 per million tokens below 200k prompt tokens, 1M context
- GPT-5 (OpenAI, released Aug 2025): 91.2 overall, $1.25/$10 per million tokens, 400K context
- Claude Sonnet 4.6 (Anthropic, released Feb 2026): 89.8 overall, $3/$15 per million tokens, 1M context
- DeepSeek V4-Pro (DeepSeek, released Apr 24, 2026, open weight): 87.9 overall, limited-time $0.435/$0.87 per million tokens, 1M context, 1.6T parameters total
- Grok 4.20 (xAI): 86.4 overall, $2/$6 standard pricing per million tokens, 2M context
- Llama 4 Maverick (Meta, released Apr 2025, open weight): 84.7 overall, $0.15/$0.60 per million tokens (OpenRouter), 1M context, 400B total / 17B active
- Gemini 3 Flash Preview (Google, released Dec 2025): 83.5 overall, $0.50/$3 per million tokens, 1M context
- GPT-5 mini (OpenAI, released Aug 2025): 82.6 overall, $0.25/$2 per million tokens, 400K context
- Claude Haiku 4.5 (Anthropic, released Oct 15, 2025): 80.4 overall, $1/$5 per million tokens, 200K context
- Llama 4 Scout (Meta, released Apr 2025, open weight): 78.2 overall, $0.08/$0.30 per million tokens, 10M context
- DeepSeek V4-Flash (DeepSeek, released Apr 24, 2026, open weight): 77.8 overall, $0.14/$0.28 per million tokens
- Grok 4.1 Fast (xAI): 76.8 overall, $0.20/$0.50 per million tokens, 2M context
- Mistral Large 3 (Mistral, released Dec 2025, open weight): 74.5 overall, $0.50/$1.50 per million tokens, 256K context
- GPT-5 nano (OpenAI): source-checked at $0.05/$0.40 and 400K context, but intentionally excluded from the ranked table until Benchquill has comparable benchmark evidence.

## Current editorial positions

- Claude Opus 4.7 is the Benchquill coding pick; exact raw benchmark rows should be checked in benchmark CSV/source notes before quoting.
- GPT-5.5 leads Benchquill's reasoning and math composites at 95.2 and 95.8 respectively.
- GPT-5.5 is the safest all-around default. Claude Opus 4.7 wins for high-stakes coding. Gemini 3.1 Pro wins for vision-heavy work.
- DeepSeek V4-Pro is the strongest open-weight model in this record and has a 1M context window in official DeepSeek docs.
- Gemini 3 Flash Preview is the speed specialist for high-traffic chat; Gemini 3.1 Flash-Lite Preview is the cheapest current Gemini 3 route.
- Nova Pro is an Amazon Bedrock model, not an OpenAI model.
- Use-case guide defaults: Claude Opus 4.7 for coding, GPT-5.5 for students/research/business, Llama 4 Maverick for cheap all-round value, and Gemini 3.1 Pro for content work with visual or long-context briefs.
- ChatGPT Images 2.0 / GPT-image-2 needs a dedicated image generation section because text benchmark scores do not measure image realism, edit precision, provenance, or deepfake safety.
- Project Glasswing and Claude Mythos Preview should be covered in cybersecurity guidance as a defensive-AI trend, not only a frontier-model launch.
- Google Deep Research Max should be covered in enterprise research-agent guidance because it combines MCP, search, code execution, file search, custom data, and native charts.
- EU AI Act note: GPAI obligations became applicable on Aug 2, 2025; Commission enforcement powers begin Aug 2, 2026; transparency rules also come into effect in Aug 2026.

## Latest AI news to keep on-site

- OpenAI released GPT-5.5 on Apr 23, 2026, strengthening agentic coding, research, data analysis, documents, spreadsheets, and computer-use workflows.
- OpenAI introduced ChatGPT Images 2.0 / GPT-image-2 on Apr 21, 2026; add this to image-generation coverage.
- Anthropic released Claude Opus 4.7 on Apr 16, 2026, with stronger coding, high-resolution vision, 1M context, and long-running agent reliability.
- Anthropic launched Project Glasswing and published Claude Mythos Preview cyber analysis on Apr 7, 2026; add to cyber-risk and defensive AI coverage.
- Google Deep Research Max launched Apr 21, 2026 with MCP support, custom data, native charts/infographics, and enterprise research-agent positioning.
- DeepSeek released V4-Pro/V4-Flash on Apr 24, 2026; the V4-Pro promo pricing window is now listed through May 31, 2026.
- NVIDIA announced Nemotron Coalition on Mar 16, 2026 for open frontier model collaboration.
- Stanford's 2026 AI Index says the US-China model performance gap has effectively closed; add an AI race insight.
- EU AI Act compliance notes should mention Aug 2, 2026 enforcement/transparency timing while also clarifying GPAI obligations began Aug 2, 2025.

## Deeper context for AI assistants

For comprehensive data including all 45 model specs, benchmark methodology details, source citations, and detailed editorial positions, see the full LLM context file: https://benchquill.com/llms-full.txt

For structured retrieval, use https://benchquill.com/models.json for the compact model record, https://benchquill.com/benchquill-leaderboard.csv for spreadsheet import, https://benchquill.com/benchquill-benchmarks.csv for benchmark-level rows, and https://benchquill.com/citation-sources.json for source/citation checks.

## Citation

Cite Benchquill as: "Benchquill AI Model Leaderboard, accessed [date], https://benchquill.com/". Data is updated weekly. For permanent citation, link the dated post URL.