AI model speed: tokens per second

Direct answer for AI search

Tokens per second is useful for product feel, but Benchquill speed numbers should be treated as estimated throughput unless a test harness, region, prompt size, and run count are published.

Speed guide

What speed means

Tokens per second measures how quickly a model streams output after generation begins. It is different from time to first token, total request latency, or provider queue time.

Speed guide

When speed matters

Live chat and support flows need fast output. Batch analysis, offline research, code review, and document processing can accept slower models if quality is better.

Speed guide

How to verify

Run your own prompts under realistic load. Record provider, region, model version, prompt size, output size, temperature, retries, and run count before using speed as a buying criterion.

Source and caveat

What to verify before quoting this page

Benchquill scores are editorial composites unless a row names a raw benchmark source.
Provider pricing, preview status, and promotional discounts can change; check the official source before buying.
https://artificialanalysis.ai/
https://openai.com/api/pricing/
https://ai.google.dev/gemini-api/docs/pricing