AI model pricing explained: input vs output tokens, context & caching (2026)
Every model on our leaderboard lists prices as dollars per million tokens. Once you understand what that means, comparing models becomes simple arithmetic instead of guesswork.
What is a token?
A token is a chunk of text — roughly 0.75 words in English, so ~1,000 tokens is about 750 words. Models bill by the token, counting both what you send (input) and what they generate (output). See the token definition for more.
The three levers that decide your bill
- Input price per 1M tokens — dominates when you stuff in long documents, system prompts or retrieved context.
- Output price per 1M tokens — usually 3–5× the input price, so it matters most when the model generates a lot.
- Context window — the max tokens the model can hold at once. Bigger isn't always better; you only pay for what you actually send.
A worked example
Say a model costs $2 input / $10 output per 1M tokens, and a typical request sends 10,000 input tokens and gets back 2,000 output tokens. That's (10,000 ÷ 1,000,000 × $2) + (2,000 ÷ 1,000,000 × $10) = $0.02 + $0.02 = $0.04 per call. At 100,000 calls/month, that's ~$4,000. Switching routine calls to a mid-tier model at a fifth of the price often cuts that 80% with little quality loss.
Don't forget caching
Most providers now offer prompt caching — repeated context (a long system prompt, a document) is billed at ~10% of the input price on cache hits. For agents and chat with stable context, this is a large real-world discount the sticker price hides.
The shortcut
Pick a cheap workhorse model and a flagship for hard cases, estimate both with the formula above, and route easy calls to the cheap one. Compare any two side-by-side on our compare pages.