DeepSeek V4 Flash
Cheaper, faster V4 tier with a ~98% cache-hit discount; extremely economical for repeated prompts.
DeepSeek V4 Flash strengths
- Among the cheapest capable models
- Massive cache savings
- 1M context
- Good general performance
Pricing & context
| Context window | 1M tokens (384K max output) |
| Input price /1M | $0.14 ($0.0028 cache hit) |
| Output price /1M | $0.28 |
| Modalities | text |
Cost guide: a typical call of ~10K input + 2K output tokens runs roughly $0.14 ($0.0028 cache hit) × 0.01 + $0.28 × 0.002 — worth modelling against cheaper tiers before committing high-volume traffic.
When to choose DeepSeek V4 Flash
DeepSeek V4 Flash is best for High-volume, repetitive workloads and budget pipelines needing long context. If your workload is more cost-sensitive, weigh it against Llama 4 Scout (~$0.08 (varies by host) input /1M) first.
DeepSeek V4 Flash FAQ
How much does DeepSeek V4 Flash cost?
DeepSeek V4 Flash is priced at $0.14 ($0.0028 cache hit) per 1M input tokens and $0.28 per 1M output tokens (public API list price), with a 1M tokens (384K max output) context window.
What is DeepSeek V4 Flash best for?
DeepSeek V4 Flash by DeepSeek is best for High-volume, repetitive workloads and budget pipelines needing long context.
Is DeepSeek V4 Flash multimodal?
DeepSeek V4 Flash supports text.
Tools that use DeepSeek V4 Flash
Other models
All models →| 01 | GPT-5.5 | OpenAI | $5.00 | → |
| 02 | GPT-5.4 | OpenAI | $2.50 | → |
| 03 | GPT-5.4 mini | OpenAI | $0.75 | → |
| 04 | Claude Opus 4.8 | Anthropic | $5.00 | → |