Benchquill methodology: how we score AI models
Eight steps that explain how Benchquill picks sources, checks prices, calculates blended cost, builds tiers, and reviews the leaderboard.
Eight-step scoring workflow
- Identify current public model version and provider.
- Check official pricing, release notes, model cards, and source dates.
- Record benchmark scores and note benchmark limitations.
- Normalize blended cost as 25% input and 75% output tokens.
- Record context window, modality, open-weight status, and speed.
- Compare against adjacent models and use-case guides.
- Review compliance, privacy, and human-oversight notes for high-risk contexts.
- Publish dated updates in sitemap, llms files, data exports, and model pages.
Benchquill does not treat benchmark rank as the only decision. A model can score well and still be the wrong choice if its price, latency, context window, provider policy, preview status, or data-handling terms do not match the workload. That is why every core template includes direct-answer copy, alternatives, source notes, and links to machine-readable exports.
Raw provider claims, public leaderboard entries, proxy estimates, and Benchquill editorial composites are kept separate in the data layer whenever possible. Speed values are marked as estimates unless a repeatable harness is available, and blended price uses a 25% input / 75% output workload so comparisons stay consistent across API providers.