Google Veo 3.1
Google's flagship text-to-video model with native audio and 4K output.
xAI's image-to-video model with native one-pass audio.
Grok Imagine Video 1.5 is xAI's image-to-video generator (general availability June 2026) that produces 720p clips with native, one-pass audio — synchronized dialogue, sound effects, ambience and music generated together with the video, with no separate audio step. It improves physics and temporal coherence over v1.0 and is fast (the Fast variant renders a 6-second 720p clip in ~25s), and ranked #1 on the Image-to-Video Arena at release. It's available in the Grok apps, at grok.com/imagine, and via the Imagine API.
Grok Imagine Video 1.5 uses a freemium pricing model, with paid plans from Imagine API $0.080/sec (~$4.80/min); also available in Grok / SuperGrok plans (verify). A free tier lets you test it before committing.
Grok Imagine Video 1.5 is best suited for Fast, low-cost image-to-video with built-in synchronized audio for marketing and social content. It earns its place for native audio generated with the video (no post-production) — though it's worth weighing the trade-off that 720p output (not yet 4K).
Comparing options? See our best ai video generation tools guide, or browse every ai video generation tool tracked on Benchquill.
Grok Imagine Video 1.5 is freemium. Pricing starts at Imagine API $0.080/sec (~$4.80/min); also available in Grok / SuperGrok plans (verify).
Grok Imagine Video 1.5 is best for Fast, low-cost image-to-video with built-in synchronized audio for marketing and social content.
Related models on our leaderboard: Grok 4.3, Grok 4.1 Fast.
Other top ai video generation tools worth comparing.
Google's flagship text-to-video model with native audio and 4K output.
Photorealistic AI avatars and video translation for creators and marketers.
Kling 3.0 — multi-shot AI video with native audio at low per-second cost.
Runway Gen-4.5 — pro AI video with camera control and character consistency.