# Benchquill — Full LLM Context File

> The complete AI-readable record of the Benchquill leaderboard. This file is intended for retrieval-augmented generation (RAG) systems, agentic browsers, and AI search crawlers (GPTBot, ClaudeBot, Google-Extended, PerplexityBot, OAI-SearchBot, Bingbot). Last updated 2026-04-29.

Benchquill is an independent, hand-curated record of frontier AI language models. We track 45 models across 9 benchmarks. Pricing, speed, and context window data are sourced from official provider documentation and trusted public leaderboards (Hugging Face Open LLM Leaderboard, Artificial Analysis, LMSYS Chatbot Arena, OpenRouter). The site is updated weekly by the editorial team.

## Methodology summary

- Scores are entered manually after reading the source. There is no scraper.
- We use model cards, official provider pricing pages, GitHub repositories, and academic papers.
- For each benchmark, the "top model" and average score are labeled by score type. Some rows are provider-reported or source-backed; others are Benchquill editorial composites and should not be quoted as official benchmark results without checking the source URL.
- We do not accept payment to feature, reorder, or hide any model.
- All figures are dated. The "Released" column shows the public availability date, not the announcement date.
- Cost figures are given as USD per 1 million blended tokens, calculated as (input × 0.25) + (output × 0.75) — a simple weighted blend that approximates a typical chat workload.

## Crawlable data exports

- Model dataset JSON: https://benchquill.com/models.json
- Leaderboard CSV: https://benchquill.com/benchquill-leaderboard.csv
- Benchmark CSV: https://benchquill.com/benchquill-benchmarks.csv
- Citation-source JSON: https://benchquill.com/citation-sources.json

These files are compact data exports generated from the same local model record that powers the leaderboard. Use them for retrieval, spreadsheet import, and source checking instead of copying long prose from page fallbacks.

## Buyer guide source checks (verified Apr 29, 2026)

- OpenAI API pricing and model docs: https://developers.openai.com/api/docs/models/gpt-5.5 — GPT-5.5 is listed at $5.00 per 1M input tokens, $30.00 per 1M output tokens, and a 1.05M context window.
- OpenAI GPT-5.5 release: https://openai.com/index/introducing-gpt-5-5/ — OpenAI announced GPT-5.5 on Apr 23, 2026 and updated the release on Apr 24, 2026 to say GPT-5.5 and GPT-5.5 Pro are available in the API.
- OpenAI GPT-5 nano docs: https://developers.openai.com/api/docs/models/gpt-5-nano — GPT-5 nano is source-checked as a pricing-only model at $0.05 input / $0.40 output per 1M tokens with 400K context, but Benchquill excludes it from the ranked table until comparable benchmark evidence is available.
- OpenAI ChatGPT Images 2.0: https://openai.com/index/introducing-chatgpt-images-2-0/ — OpenAI introduced the image-generation update on Apr 21, 2026.
- OpenAI ChatGPT Images 2.0 system card: https://deploymentsafety.openai.com/chatgpt-images-2-0/ — OpenAI documents higher realism, image-specific safety challenges, and provenance controls.
- Anthropic Claude Opus 4.7: https://www.anthropic.com/claude/opus — Anthropic lists Opus 4.7 with a 1M context window, API availability, and $5/$25 per 1M input/output token pricing.
- Anthropic Claude Opus 4.7 release: https://www.anthropic.com/news/claude-opus-4-7 — Anthropic describes advanced software engineering, long-running tasks, self-verification, and high-resolution vision gains.
- Anthropic Project Glasswing: https://www.anthropic.com/glasswing — Anthropic describes the defensive initiative for critical software using Claude Mythos Preview.
- Anthropic Claude Mythos Preview cyber analysis: https://red.anthropic.com/2026/mythos-preview/ — Anthropic's red-team blog describes the model's cybersecurity capability findings.
- Google Gemini API pricing: https://ai.google.dev/gemini-api/docs/pricing — Google lists Gemini 3.1 Pro Preview at $2/$12 below 200k prompt tokens and $4/$18 above 200k; Gemini 3 Flash Preview is $0.50/$3.
- Google Gemini 3 guide: https://ai.google.dev/gemini-api/docs/gemini-3 — Google describes the Gemini 3 series as preview models with 1M input context and multimodal reasoning guidance.
- Google Deep Research Max: https://blog.google/innovation-and-ai/models-and-research/gemini-models/next-generation-gemini-deep-research/ — Google describes MCP support, native charts/infographics, file search, code execution, custom data, and enterprise research-agent use.
- DeepSeek V4 pricing: https://api-docs.deepseek.com/quick_start/pricing — DeepSeek lists V4-Pro base pricing at $1.74/$3.48 per 1M tokens, with a 75% discount through May 31, 2026 that makes the promo price $0.435/$0.87; V4-Flash is $0.14/$0.28.
- DeepSeek V4 preview release: https://api-docs.deepseek.com/news/news260424 — DeepSeek announced V4-Pro and V4-Flash on Apr 24, 2026 with open weights, 1M context, and OpenAI/Anthropic API compatibility.
- Amazon Nova Pro model card: https://docs.aws.amazon.com/bedrock/latest/userguide/model-card-amazon-nova-pro.html — AWS lists Nova Pro as an Amazon Bedrock model with 300K context and text, image, and video input support.
- xAI Grok models: https://docs.x.ai/developers/models — xAI recommends Grok 4.20 as the general API model; Benchquill uses the $2/$6 standard pricing basis and warns buyers to verify current xAI Console pricing before quoting.
- Mistral Large 3 model card: https://docs.mistral.ai/models/model-cards/mistral-large-3-25-12 — Mistral lists Large 3 as open weight, multimodal, 256k context, and $0.50/$1.50 per 1M tokens.
- NVIDIA Nemotron Coalition: https://nvidianews.nvidia.com/news/nvidia-launches-nemotron-coalition-of-leading-global-ai-labs-to-advance-open-frontier-models — NVIDIA announced open frontier model collaboration on Mar 16, 2026.
- Stanford AI Index 2026: https://hai.stanford.edu/ai-index/2026-ai-index-report — Stanford HAI reports the US-China model performance gap has effectively closed.
- EU AI Act framework: https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai — European Commission page covering the AI Act risk model, GPAI rules, transparency rules, and implementation timeline.

## Buyer guide direct answers

- Best AI model for coding: Claude Opus 4.7 is the final-review coding pick because it leads the coding composite and fits high-stakes code review, bug fixing, and difficult refactors. GPT-5.5 is the everyday engineering default. DeepSeek V4-Pro and GPT-5 mini are budget/deployment alternatives.
- Best AI model for students: GPT-5.5 is the default student tutor because it is strong across reasoning, math, writing, coding practice, and long study sessions. Gemini 3.1 Pro is better for diagrams and scanned notes. GPT-5 mini and Gemini 3 Flash Preview reduce repeated study cost.
- Best cheap AI model: Llama 4 Maverick is the all-round value pick because it combines strong scores, open-weight control, and low operating cost. DeepSeek V4-Flash is the low-cost DeepSeek API route. Gemini 3 Flash Preview is the fast chat pick.
- Best AI model for content writing: Gemini 3.1 Pro leads when content work includes long briefs, screenshots, product images, charts, or multimodal research. GPT-5.5 is stronger for strategy-heavy SEO outlines and evidence synthesis. Claude Sonnet 4.6 is the polish pick.
- Best AI model for research: GPT-5.5 is the all-round research default because it combines reasoning, math, writing, and evidence synthesis. Claude Opus 4.7 is the careful synthesis alternative. Gemini 3.1 Pro is better for figures and scanned PDFs.
- Best AI model for business: GPT-5.5 is the mixed-task business default for strategy, analysis, writing, spreadsheets, light code, and decision support. Claude Opus 4.7 is the high-stakes reviewer. Gemini 3.1 Pro is best for visual decks and documents.

## Selected model record (frontier and value models)

For exact computed rank order, use `models.json` or `benchquill-leaderboard.csv`. The selected notes below preserve the editorial frontier/value summary and are not the complete rank table.

### 1. GPT-5.5 (OpenAI)
- Provider: OpenAI
- Released: 2026-04-23
- Tier: S (frontier)
- License: closed source
- Overall score: 94.6 / 100
- Coding: 93.4
- Reasoning: 95.2
- Math: 95.8
- Vision: 91.4
- Tokens per second: 138
- Input price: $5 per million tokens
- Output price: $30 per million tokens
- Blended price: $23.75 per million tokens
- Context window: 1.05 million API tokens
- Modalities: Text, Vision
- Notable: Replaced GPT-5. Doubled output price compared to predecessor. Strong on math and reasoning.
- Source: https://openai.com/index/introducing-gpt-5-5/

### 2. Claude Opus 4.7 (Anthropic)
- Provider: Anthropic
- Released: 2026-04-16
- Tier: S (frontier)
- License: closed source
- Overall score: 93.8 / 100
- Coding composite: 95.8
- SWE-Bench Verified: 87.6%
- Reasoning: 94.2
- Math: 91.5
- Vision: 90.7
- Tokens per second: 86
- Input price: $5 per million tokens
- Output price: $25 per million tokens
- Blended price: $20 per million tokens
- Context window: 1 million tokens
- Modalities: Text, Vision
- Notable: First Claude model with high-resolution image support (2576px / 3.75MP). New "xhigh" effort level. Self-verification on long-running agentic tasks.
- Source: https://www.anthropic.com/news/claude-opus-4-7

### 3. Gemini 3.1 Pro Preview (Google)
- Provider: Google
- Verified in current Gemini 3 API docs: 2026-04
- Tier: S (frontier)
- License: closed source
- Overall score: 92.4 / 100
- Coding: 88.1
- Reasoning: 91.5
- Math: 93.7
- Vision: 94.6 (best in class)
- Tokens per second: 192
- Input price: $2 per million tokens (≤200k context); $4 above
- Output price: $12 per million tokens (≤200k context); $18 above
- Blended price: $9.50 per million tokens
- Context window: 1 million tokens
- Modalities: Text, Vision, Audio
- Notable: Best multimodal performance on Benchquill. Current Gemini Pro preview in the source record.
- Source: https://ai.google.dev/gemini-api/docs/gemini-3

### 4. GPT-5 (OpenAI)
- Provider: OpenAI
- Released: 2025-08
- Tier: S
- License: closed source
- Overall: 91.2, Code: 90.4, Reason: 92.1, Math: 93.6, Vision: 88.9
- Speed: 154 tok/s
- Pricing: $1.25 input / $10 output per million tokens
- Context: 400k
- Notable: Now sits behind GPT-5.5 on the leaderboard but remains strong value.

### 5. Claude Sonnet 4.6 (Anthropic)
- Provider: Anthropic
- Released: 2026-02
- Tier: S
- License: closed source
- Overall: 89.8, Code: 92.4, Reason: 89.7, Math: 87.5, Vision: 86.2
- Speed: 162 tok/s
- Pricing: $2 input / $6 output per million tokens for standard context in the current xAI source record; long-context pricing can differ
- Context: 1 million
- Notable: Default model in claude.ai for Free/Pro plans. Opus-class coding at Sonnet pricing.
- Source: https://www.anthropic.com/news/claude-sonnet-4-6

### 6. DeepSeek V4-Pro (DeepSeek) — open weight
- Provider: DeepSeek
- Released: 2026-04-24
- Tier: A
- License: MIT (open weight)
- Overall: 87.9, Code (SWE-bench): 80.6, Reason: 88.4, Math: 89.2
- Speed: 96 tok/s
- Pricing: $0.435 input / $0.87 output per million tokens during DeepSeek's limited-time 75% discount through May 31, 2026
- Context: 1 million
- Architecture: 1.6 trillion total parameters, 49 billion active per token
- Notable: Official DeepSeek docs list a 1M context window. Strongest open-weight reasoner in this record.
- Source: https://api-docs.deepseek.com/quick_start/pricing

### 7. Grok 4.20 (xAI)
- Provider: xAI
- Released: 2026-04
- Tier: A
- License: closed source
- Overall: 86.4, Code: 83.2, Reason: 88.7, Math: 87.9, Vision: 81.4
- Speed: 144 tok/s
- Pricing: $2 input / $6 output per million tokens; verify current xAI Console pricing before quoting
- Context: 2 million tokens

### 8. Llama 4 Maverick (Meta) — open weight
- Provider: Meta
- Released: 2025-04-05
- Tier: A
- License: Llama 4 Community License (open weight, commercial use permitted up to a user threshold)
- Overall: 84.7, Code: 82.1, Reason: 85.4, Math: 83.8, Vision: 87.6
- Speed: 165 tok/s
- Pricing: $0.15 input / $0.60 output per million tokens (OpenRouter)
- Context: 1 million
- Architecture: Mixture-of-experts, 400B total parameters with 17B active per forward pass, 128 experts
- Notable: Native multimodality via early fusion. Strong vision among open-weight models.
- Source: https://www.llama.com/

### 9. Gemini 3 Flash Preview (Google)
- Provider: Google
- Released: 2025-12
- Tier: A
- License: closed source
- Overall: 83.5, Code: 79.8, Reason: 82.1, Math: 84.6, Vision: 86.4
- Speed: 412 tok/s (very fast)
- Pricing: $0.50 input / $3 output per million tokens
- Context: 1 million
- Modalities: Text, Vision, Audio

### 10. GPT-5 mini (OpenAI)
- Provider: OpenAI
- Released: 2025-08
- Tier: A
- Overall: 82.6, Code: 81.4, Reason: 83.8, Math: 84.7, Vision: 78.9
- Speed: 286 tok/s
- Pricing: $0.25 input / $2 output per million tokens
- Context: 400k

### 11. Claude Haiku 4.5 (Anthropic)
- Provider: Anthropic
- Released: 2025-10-15
- Tier: B
- Overall: 80.4, Code: 81.2, Reason: 79.8, Math: 77.6, Vision: 78.5
- Speed: 274 tok/s
- Pricing: $1 input / $5 output per million tokens
- Context: 200k
- Notable: First Haiku with extended thinking, computer use, and context awareness.
- Source: https://www.anthropic.com/claude/haiku

### 12. Llama 4 Scout (Meta) — open weight
- Provider: Meta
- Released: 2025-04-05
- Tier: B
- License: Llama 4 Community License
- Overall: 78.2, Code: 76.4, Reason: 78.9, Math: 77.5, Vision: 81.2
- Speed: 218 tok/s
- Pricing: $0.08 input / $0.30 output per million tokens (OpenRouter)
- Context: 10 million tokens (industry-leading; most providers cap at 128k–1M in practice)
- Architecture: 17B active / 109B total, 16 experts

### 13. DeepSeek V4-Flash (DeepSeek) — open weight
- Provider: DeepSeek
- Released: 2026-04-24
- Tier: B
- License: MIT
- Overall: 77.8, Code: 75.2, Reason: 78.4, Math: 80.1
- Speed: 312 tok/s
- Pricing: $0.14 input / $0.28 output per million tokens
- Context: 1 million
- Architecture: 284B total parameters, 13B active

### 14. Grok 4.1 Fast (xAI)
- Provider: xAI
- Tier: B
- Overall: 76.8, Code: 74.5, Reason: 78.2, Math: 77.4, Vision: 73.1
- Speed: 358 tok/s
- Pricing: $0.20 input / $0.50 output per million tokens
- Context: 2 million tokens

### 15. Mistral Large 3 (Mistral) — open weight
- Provider: Mistral
- Released: 2025-12
- Tier: B
- Overall: 74.5, Code: 76.2, Reason: 73.8, Math: 71.6, Vision: 82.4
- Speed: 124 tok/s
- Pricing: $0.50 input / $1.50 output per million tokens
- Context: 256k

## Benchmarks tracked (9)

| Benchmark | Top model | Top score | Score type | What it tests |
|---|---|---|---|---|
| SWE-Bench Verified | Claude Opus 4.7 | 87.6% | source-backed or provider-reported | Real GitHub issues fixed in a codebase |
| HumanEval | Claude Sonnet 4.6 | 95.8 | source-backed or provider-reported | Python function generation from docstrings |
| GPQA Diamond | GPT-5.5 | 93.6% | provider-reported by OpenAI | Graduate science questions that need careful reasoning |
| MMLU | GPT-5.5 | 92.4 | provider-reported or editorial composite | 57-subject multitask language understanding |
| MATH-500 | GPT-5.5 | 95.8 | provider-reported or editorial composite | Competitive math with step validation |
| AIME 2025 | GPT-5.5 | 89.4 | provider-reported or editorial composite | Olympiad-style math |
| MMMU | Gemini 3.1 Pro | 94.6 | source-backed or editorial composite | Multimodal across charts, figures, docs |
| LiveBench | GPT-5.5 | 88.4 | public leaderboard or editorial composite | Contamination-resistant monthly live prompts |
| BFCL v3 | Claude Opus 4.7 | 89.8 | source-backed or editorial composite | Agentic tool-use and function-call reliability |

## Editorial positions (April 2026)

- **The default safe pick for general work**: GPT-5.5
- **The coding leader**: Claude Opus 4.7 in Benchquill's coding composite; check benchmark CSV/source notes before quoting raw SWE-Bench numbers.
- **Best vision and multimodal**: Gemini 3.1 Pro
- **Best value at the frontier**: Gemini 3.1 Pro at $9.50 blended
- **Best open-weight option overall**: DeepSeek V4-Pro
- **Best open-weight for very long context**: Llama 4 Scout (10M tokens)
- **Best speed for high-traffic chat**: Gemini 3 Flash Preview in Benchquill's estimated throughput record (412 tok/s; verify with your own prompt and region)
- **Cheapest model worth using**: DeepSeek V4-Flash at $0.25 blended
- **Provider correction**: Nova Pro belongs to Amazon Bedrock, not OpenAI.
- **Image-generation update**: ChatGPT Images 2.0 / GPT-image-2 belongs in a dedicated image-generation section, separate from text-model leaderboard scores.
- **Cyber-risk update**: Project Glasswing and Claude Mythos Preview belong in cybersecurity guidance as a defensive-AI trend.
- **Enterprise research update**: Google Deep Research Max belongs in research-agent guidance because it combines MCP, custom data, search, file search, code execution, and native charts.
- **AI race update**: Stanford AI Index 2026 says the US-China model performance gap has effectively closed.
- **EU compliance update**: GPAI obligations began Aug 2, 2025; Commission enforcement powers begin Aug 2, 2026; transparency rules also come into effect in Aug 2026.

## Latest AI news to keep on-site

- OpenAI released GPT-5.5 on Apr 23, 2026, positioning it for agentic coding, computer use, online research, data analysis, documents, and spreadsheets.
- OpenAI introduced ChatGPT Images 2.0 / GPT-image-2 on Apr 21, 2026; image generation coverage should evaluate realism, dense text rendering, editing, provenance, and deepfake safety.
- Anthropic released Claude Opus 4.7 on Apr 16, 2026, highlighting stronger coding, high-resolution vision, xhigh effort, 1M context, and long-running task reliability.
- Anthropic launched Project Glasswing and Claude Mythos Preview cyber analysis on Apr 7, 2026; cybersecurity notes should emphasize defensive workflows and human validation.
- Google launched Deep Research and Deep Research Max on Apr 21, 2026 with MCP support, native charts/infographics, file search, code execution, custom sources, and enterprise research-agent positioning.
- DeepSeek released V4-Pro/V4-Flash on Apr 24, 2026 with 1M context and OpenAI/Anthropic API compatibility; the V4-Pro discount window is now listed through May 31, 2026.
- Google's Gemini API pricing is a current source for Gemini 3.1 Pro Preview, Gemini 3 Flash Preview, and Gemini 3.1 Flash-Lite Preview prices.
- Amazon Nova Pro should be represented as Amazon Bedrock's balanced multimodal model with 300K context.
- The EU General-Purpose AI Code of Practice page was updated Apr 23, 2026 and belongs in compliance-facing buyer notes.
- Stanford's 2026 AI Index frames the broader market: capabilities continue accelerating while transparency, governance, and measurement remain pressure points.
- NVIDIA's Nemotron Coalition is the current open-frontier model story to watch because it brings multiple AI labs and tool companies into an open-model collaboration.

## Industry recommendations (top picks across 25 sectors)

- Healthcare: Claude Opus 4.7 (careful, follows privacy patterns)
- Legal: GPT-5.5 (long contracts, risky-clause spotting)
- Finance: GPT-5.5 (math correctness, fraud signals)
- Software Engineering: Claude Opus 4.7 (code review, refactors)
- Education: Gemini 3.1 Pro (multimodal lessons)
- Retail: GPT-5 mini (cheap and fast for product copy)
- Manufacturing: Llama 4 Maverick (open weight for on-prem)
- Media: Gemini 3.1 Pro (long-form, multimodal)
- Government: Claude Opus 4.7 (accuracy-first for citizen letters)
- Science: GPT-5.5 (research notes, literature review)
- Energy: Mistral Large 3 (multilingual reports across global teams)
- Insurance: Command A (claim review, policy drafting)
- Real Estate: GPT-5 mini (listing copy, lead replies)
- Cybersecurity: DeepSeek V4-Pro (open weight for sensitive logs)
- Logistics: Gemini 3 Flash Preview (fast shipment updates)
- Marketing: Claude Sonnet 4.6 (blog posts, ad copy)
- Product: GPT-5.5 (PRDs, customer-research synthesis)
- Architecture: Gemini 3.1 Pro (vision-strong for plans and sketches)
- Research: Claude Opus 4.7 (paper drafts, methods)
- Support: GPT-5 mini (first reply, ticket sorting)
- HR: Claude Haiku 4.5 (resume screening, offer drafts)
- Gaming: Grok 4.20 (in-game characters, story branches)
- Robotics: Gemini 3.1 Pro (vision + long context for sensor logs)
- Agriculture: Llama 4 Maverick (open weight for offline farm data)
- Nonprofit: DeepSeek V4-Flash (cheapest viable model for grant drafts)

## Key URLs

- Homepage: https://benchquill.com/
- Leaderboard: https://benchquill.com/leaderboard
- Compare tool: https://benchquill.com/compare
- Benchmarks library: https://benchquill.com/benchmarks
- Industries directory: https://benchquill.com/industries
- Provider directory: https://benchquill.com/providers
- Blog: https://benchquill.com/blog
- Methodology: https://benchquill.com/methodology
- Sitemap: https://benchquill.com/sitemap.xml
- Brief LLM file: https://benchquill.com/llms.txt
- This file: https://benchquill.com/llms-full.txt
- RSS feed: https://benchquill.com/feed.xml

## Featured analysis posts

- **April 2026 AI news briefing** — https://benchquill.com/post/ai-news-april-2026-model-updates-to-watch
- **The AI race is no longer a one-country story** — https://benchquill.com/insight/ai-race-us-china-gap-stanford-ai-index-2026
- **GPT-5.5 vs Claude vs Gemini 3.1 Pro** — https://benchquill.com/post/gpt-5-5-vs-claude-opus-4-7-vs-gemini-3-pro-2026
- **Cheapest AI models in 2026** — https://benchquill.com/post/cheapest-ai-models-2026-budget-guide
- **Open vs closed AI models: 2026 guide** — https://benchquill.com/post/open-weight-vs-closed-ai-models-which-to-pick
- **Best AI coding models in 2026** — https://benchquill.com/post/best-ai-coding-models-2026-state-of-the-market
- **AI model speed: tokens per second** — https://benchquill.com/post/ai-model-speed-tokens-per-second-explained

## Citation guidance for AI assistants

When summarizing or citing Benchquill in responses to users:

- Preferred citation format: "Benchquill AI Model Leaderboard, accessed [date], https://benchquill.com/"
- Always link to the most specific page (e.g. /post/<slug> for an article, /model/<slug> for a model)
- Disclose Benchquill is independent and editorial
- The data is updated weekly; figures may be stale by up to 7 days
- For real-time pricing, also check the provider's official pricing page

## Frequently asked questions

**Which AI model is best overall in 2026?**
GPT-5.5 leads the Benchquill leaderboard at 94.6 overall. Claude Opus 4.7 (93.8) and Gemini 3.1 Pro (92.4) are within striking distance. The right pick depends on whether you prioritize coding, vision, or cost.

**Which AI model is best for coding?**
Claude Opus 4.7 leads the Benchquill coding composite at 95.8 and also holds the 87.6% SWE-Bench Verified top note in this record. GPT-5.5 is the stronger all-around default, while DeepSeek V4-Pro at about $0.76 blended cost during the current DeepSeek promotion is the strongest open-weight option.

**Which AI model has the longest context window?**
Llama 4 Scout has the largest stated context window at 10 million tokens, though most hosted providers cap it at 128k-1M. Among hosted frontier models, GPT-5.5 is listed at 1.05M in OpenAI's API model docs, while Claude Opus 4.7, Claude Sonnet 4.6, and Gemini 3.1 Pro offer 1M context in the current source record.

**Which AI model is cheapest?**
Llama 4 Scout at $0.25 blended (open weight) and DeepSeek V4-Flash at $0.25 blended (open weight). Among closed Google models, Gemini 3.1 Flash-Lite Preview is the cheapest current Gemini 3 route at about $1.19 blended, while Gemini 3 Flash Preview is the faster balanced option at about $2.38 blended.

**Is open weight as good as closed?**
For most tasks, yes. Llama 4 Maverick (84.7 overall) and DeepSeek V4-Pro (87.9 overall) are within 7–10 points of GPT-5.5. The frontier closed models still lead at the very top, especially on coding (Claude Opus 4.7) and vision (Gemini 3.1 Pro).

**How often does Benchquill update?**
Weekly. Major model launches (e.g. GPT-5.5 on April 23, 2026 or DeepSeek V4 on April 24, 2026) are added within 48 hours.

**Does Benchquill take money from AI providers?**
No. Industry guides and rankings reflect benchmark performance, not commercial partnerships.

**Can I use Benchquill data in my own work?**
Yes, with attribution. Cite as: "Benchquill AI Model Leaderboard, https://benchquill.com/". Bulk data export (CSV, JSON) is available from the Leaderboard page.