DeepSeek V4 Flash

Cheaper, faster V4 tier with a ~98% cache-hit discount; extremely economical for repeated prompts.

DeepSeek V4 Flash strengths

Among the cheapest capable models
Massive cache savings
1M context
Good general performance

Pricing & context

Context window	1M tokens (384K max output)
Input price /1M	$0.14 ($0.0028 cache hit)
Output price /1M	$0.28
Modalities	text

Cost guide: a typical call of ~10K input + 2K output tokens runs roughly $0.14 ($0.0028 cache hit) × 0.01 + $0.28 × 0.002 — worth modelling against cheaper tiers before committing high-volume traffic.

When to choose DeepSeek V4 Flash

DeepSeek V4 Flash is best for High-volume, repetitive workloads and budget pipelines needing long context. If your workload is more cost-sensitive, weigh it against Llama 4 Scout (~$0.08 (varies by host) input /1M) first.