Which AI video models does ClipStudios include?

ClipStudios includes 18+ video models: Veo 3.1 Lite, Fast, and Quality (Google); Kling 2.1, 2.5, 2.6, and 3.0 (Kuaishou); Seedance 1.5 Pro plus ByteDance Seedance Lite and Pro (ByteDance); Wan 2.2, 2.5, 2.6, and 2.7 (Alibaba); and Runway Gen-3 Alpha (RunwayML). Every paid plan unlocks the full catalog. See the full list with credit costs on this page or in the FAQ at /faq.

Which model is best for marketing videos?

Wan 2.6 is optimised for cinematic, structured storytelling at 1080p with deliberate pacing — well-suited to product demos, brand videos, and educational explainers up to 15 seconds. For short-form social marketing with native audio in a single pass, use Kling 2.6 (5–10s clips). See /faq for full model-by-model prompting guidance.

Which model is best for cinematic storytelling?

Veo 3.1 Quality is best for cinematic motion fidelity, structured camera control, and high prompt adherence. Seedance 1.5 Pro is best for multi-shot narratives (it supports "lens switch" cuts in a single generation). Kling 2.6 and Kling 3.0 are best for action sequences with strong temporal consistency.

Which model includes native audio generation?

Kling 2.6, Kling 3.0, and Seedance 1.5 Pro generate native audio (speech, ambient sound) in a single pass. Veo 3.1 generates native audio for image-to-video too. For models without native audio (Wan, Runway, older Kling), add narration via the built-in Voice Studio or sound design in post-production.

What's the difference between Kling 2.6 and Veo 3.1?

Kling 2.6 specialises in short-form (5–10s) videos with native audio, fast generation, and strong action physics — ideal for social-media clips and dialogue-driven scenes. Veo 3.1 specialises in cinematic motion control, precise camera direction, and structured scene-based storytelling at higher fidelity — ideal for marketing and brand video production. See /faq for the full prompting comparison.

AI Models — ClipStudios.AI

ByteDance

ByteDance Seedance Lite

All paid plans

Budget-Friendly

Simple Videos

Visual Storytelling Powered by Intelligent Video Generation

Text-to-Video and Image-to-Video

Seedance 1.0 transforms your creative concepts into dynamic visual narratives with an intuitive understanding of story structure and motion choreography. Designed for creators who think in stories rather than technical specifications, this model generates cohesive video content that flows naturally from scene to scene, bridging the gap between imagination and polished visual output.

Available Resolutions:

480p

720p

1080p

Available Durations:

5 seconds

10 seconds

Credit Costs:

480p:5s = 1 credits10s = 2 credits

720p:5s = 1 credits10s = 2 credits

ByteDance

ByteDance Seedance Pro

All paid plans

Quality Balance

Marketing

Visual Storytelling Powered by Intelligent Video Generation

Text-to-Video and Image-to-Video

Seedance 1.0 transforms your creative concepts into dynamic visual narratives with an intuitive understanding of story structure and motion choreography. Designed for creators who think in stories rather than technical specifications, this model generates cohesive video content that flows naturally from scene to scene, bridging the gap between imagination and polished visual output.

Available Resolutions:

480p

720p

1080p

Available Durations:

5 seconds

10 seconds

Credit Costs:

480p:5s = 2 credits10s = 4 credits

720p:5s = 2 credits10s = 4 credits

ByteDance

ByteDance Seedance Pro Fast

All paid plans

Fast Premium

Agencies

Visual Storytelling Powered by Intelligent Video Generation

Image-to-Video

Seedance 1.0 transforms your creative concepts into dynamic visual narratives with an intuitive understanding of story structure and motion choreography. Designed for creators who think in stories rather than technical specifications, this model generates cohesive video content that flows naturally from scene to scene, bridging the gap between imagination and polished visual output.

Available Resolutions:

720p

1080p

Available Durations:

5 seconds

10 seconds

Credit Costs:

720p:5s = 2 credits10s = 4 credits

1080p:5s = 2 credits10s = 4 credits

Kling (Kuaishou)

Kling 2.6

All paid plans

Native audio

AI Audio

Expressive

Text-to-Video and Image-to-Video

Available Resolutions:

1080p

Available Durations:

5 seconds

10 seconds

Credit Costs:

1080p:5s = 6 credits10s = 12 credits

Kling (Kuaishou)

Kling 3.0

All paid plans

Native audio

AI Audio

Action

High Fidelity

Kling 3.0 is Kuaishou's latest video model with native audio generation, stronger action physics, and improved temporal consistency. It excels at dynamic action sequences and dialogue-driven scenes with synchronized sound in a single pass. Use natural-language prompts describing subject, motion, scene, and camera.

Text-to-Video and Image-to-Video

Kling 3.0 is Kuaishou's latest video model with native audio generation, stronger action physics, and improved temporal consistency. It excels at dynamic action sequences and dialogue-driven scenes with synchronized sound in a single pass. Use natural-language prompts describing subject, motion, scene, and camera.

Available Resolutions:

720p

1080p

2160p

Available Durations:

3 seconds

4 seconds

5 seconds

6 seconds

7 seconds

8 seconds

9 seconds

10 seconds

11 seconds

12 seconds

13 seconds

14 seconds

15 seconds

Credit Costs:

720p:3s = 4 credits4s = 6 credits5s = 7 credits6s = 8 credits7s = 10 credits8s = 11 credits9s = 13 credits10s = 14 credits11s = 15 credits12s = 17 credits13s = 18 credits14s = 20 credits15s = 21 credits

1080p:3s = 5 credits4s = 7 credits5s = 9 credits6s = 11 credits7s = 13 credits8s = 14 credits9s = 16 credits10s = 18 credits11s = 20 credits12s = 22 credits13s = 23 credits14s = 25 credits15s = 27 credits

Kling (Kuaishou)

Kling v2.1 Master

All paid plans

Studio Quality

Enterprise

Kling 2.1 Master is the enhanced Kling 2.1 variant with stronger prompt adherence and higher visual consistency. It outputs 1080p only and is best for final-quality outputs. It excels at realistic motion, scene consistency, and cinematic camera control. Use natural language and cinematic camera terms. Negative prompts are supported. You can generate audio and merge it with the video in post-production.

Text-to-Video and Image-to-Video

Kling 2.1 Master is the enhanced Kling 2.1 variant with stronger prompt adherence and higher visual consistency. It outputs 1080p only and is best for final-quality outputs. It excels at realistic motion, scene consistency, and cinematic camera control. Use natural language and cinematic camera terms. Negative prompts are supported. You can generate audio and merge it with the video in post-production.

Available Resolutions:

1080p

Available Durations:

5 seconds

10 seconds

Credit Costs:

1080p:5s = 16 credits10s = 32 credits

Kling (Kuaishou)

Kling v2.1 Pro

All paid plans

Professional

Commercial

Kling 2.1 Pro is a high-quality text-to-video and image-to-video model known for realistic motion, scene consistency, and strong camera control. It excels at action sequences, nature and landscape videos, and scenes with natural physics and movement. Structure prompts as: Subject → Movement → Scene → Camera / Lighting / Atmosphere. Use natural language and cinematic camera terms (close-up, tracking shot, dolly). Supports prompts up to 2,500 characters. Negative prompts are supported. Audio must be added in post-production.

Image-to-Video

Kling 2.1 Pro is a high-quality text-to-video and image-to-video model known for realistic motion, scene consistency, and strong camera control. It excels at action sequences, nature and landscape videos, and scenes with natural physics and movement. Structure prompts as: Subject → Movement → Scene → Camera / Lighting / Atmosphere. Use natural language and cinematic camera terms (close-up, tracking shot, dolly). Supports prompts up to 2,500 characters. Negative prompts are supported. Audio must be added in post-production.

Available Resolutions:

1080p

Available Durations:

5 seconds

10 seconds

Credit Costs:

1080p:5s = 5 credits10s = 10 credits

Kling (Kuaishou)

Kling v2.1 Standard

All paid plans

Social Media

Quick Content

Kling 2.1 is a high-quality text-to-video and image-to-video model focused on realistic motion, scene consistency, and strong camera control. It produces visually coherent videos with natural physics and movement, and works well for action sequences, nature and landscape content, and scenes that need smooth motion. It supports cinematic camera language (pan, dolly, tracking, handheld) and negative prompts, and works best with clear, natural-language prompts that describe subject, movement, scene, and lighting. Kling 2.1 does not generate audio; add music or voiceover in post-production.

Image-to-Video

Kling 2.1 is a high-quality text-to-video and image-to-video model focused on realistic motion, scene consistency, and strong camera control. It produces visually coherent videos with natural physics and movement, and works well for action sequences, nature and landscape content, and scenes that need smooth motion. It supports cinematic camera language (pan, dolly, tracking, handheld) and negative prompts, and works best with clear, natural-language prompts that describe subject, movement, scene, and lighting. Kling 2.1 does not generate audio; add music or voiceover in post-production.

Available Resolutions:

720p

Available Durations:

5 seconds

10 seconds

Credit Costs:

720p:5s = 3 credits10s = 6 credits

Kling (Kuaishou)

Kling v2.5 Turbo Pro

All paid plans

Premium Quality

Advertising

Next-Generation Video Production with Fluid Motion Control

Text-to-Video and Image-to-Video

Kling 2.5 Turbo represents the latest innovation from Kuaishou, bringing refined text-to-video and image-to-video capabilities to your creative workflow. This iteration emphasises a better understanding of creative prompts, smoother motion transitions, and rock-solid consistency.

Available Resolutions:

1080p

Available Durations:

5 seconds

10 seconds

Credit Costs:

1080p:5s = 5 credits10s = 10 credits

Runway

All paid plans

Creative

Artistic

Runway Gen-3 Alpha is a high-fidelity AI video model for cinematic motion, smooth transitions, and precise camera control. It supports text-to-video and image-to-video and excels at realistic movement, stylized visuals, and seamless scene continuity. Best for product reveals, atmospheric scenes, FPV fly-throughs, and tracking shots. Use direct, descriptive prompts that define motion and camera behavior. Shorter clips (4–6 seconds) work best. Does not generate audio.

Text-to-Video and Image-to-Video

Runway Gen-3 Alpha is a high-fidelity AI video model for cinematic motion, smooth transitions, and precise camera control. It supports text-to-video and image-to-video and excels at realistic movement, stylized visuals, and seamless scene continuity. Best for product reveals, atmospheric scenes, FPV fly-throughs, and tracking shots. Use direct, descriptive prompts that define motion and camera behavior. Shorter clips (4–6 seconds) work best. Does not generate audio.

Available Resolutions:

720p

1080p

Available Durations:

5 seconds

10 seconds

Credit Costs:

720p:5s = 2 credits10s = 4 credits

1080p:5s = 3 credits

ByteDance

Seedance 1.5 Pro

All paid plans

Native audio

AI Audio

Complete Output

Best Value

Seedance 1.5 Pro is a motion-focused text-to-video and image-to-video model built for dynamic scenes, camera movement, and action. It supports multi-shot sequences with "Shot Switch" and optional AI-generated audio (sound effects and ambience) in one pass. Use motion-first prompts: subject, action, and camera. Adverbs like "slowly" or "quickly" control motion intensity. Negative prompts are not supported.

Text-to-Video and Image-to-Video

Seedance 1.5 Pro is a motion-focused text-to-video and image-to-video model built for dynamic scenes, camera movement, and action. It supports multi-shot sequences with "Shot Switch" and optional AI-generated audio (sound effects and ambience) in one pass. Use motion-first prompts: subject, action, and camera. Adverbs like "slowly" or "quickly" control motion intensity. Negative prompts are not supported.

Available Resolutions:

480p

720p

1080p

Available Durations:

4 seconds

8 seconds

12 seconds

Credit Costs:

480p:4s = 1 credits8s = 2 credits12s = 3 credits

720p:4s = 2 credits8s = 4 credits12s = 6 credits

Google

Veo 3.1 Fast

All paid plans

Native audio

High Quality

Fast Turnaround

Veo 3.1 Fast is a cinematic AI video model built for high-fidelity motion, precise camera control, and structured storytelling. It suits cinematic storytelling, product demos, educational content, and scene-based narratives. Describe the scene and environment, specify actions, and add camera movement (pan, dolly, static). Think in scenes, not single frames. More detail generally improves results.

Text-to-Video and Image-to-Video

Veo 3.1 Fast is a cinematic AI video model built for high-fidelity motion, precise camera control, and structured storytelling. It suits cinematic storytelling, product demos, educational content, and scene-based narratives. Describe the scene and environment, specify actions, and add camera movement (pan, dolly, static). Think in scenes, not single frames. More detail generally improves results.

Available Resolutions:

720p

1080p

Available Durations:

4 seconds

6 seconds

8 seconds

Credit Costs:

720p:4s = 6 credits6s = 6 credits8s = 6 credits

1080p:4s = 7 credits6s = 7 credits8s = 7 credits

Google

Veo 3.1 Lite

Free & all paid plans

Native audio

Free Plan

Quick Drafts

Native Audio

Veo 3.1 Lite is Google's efficient Veo variant — the model included on the Free plan. It generates cinematic motion with native audio at 720p and 1080p in 4, 6, or 8-second clips, making it ideal for quick drafts, social clips, and trying ideas before committing credits to Fast or Quality.

Text-to-Video and Image-to-Video

Veo 3.1 Lite is Google's efficient Veo variant — the model included on the Free plan. It generates cinematic motion with native audio at 720p and 1080p in 4, 6, or 8-second clips, making it ideal for quick drafts, social clips, and trying ideas before committing credits to Fast or Quality.

Available Resolutions:

720p

1080p

4k

Available Durations:

4 seconds

6 seconds

8 seconds

Credit Costs:

720p:4s = 3 credits6s = 3 credits8s = 3 credits

1080p:4s = 4 credits6s = 4 credits8s = 4 credits

Google

Veo 3.1 Quality

All paid plans

Native audio

Cinematic

Premium Content

Veo 3.1 Quality is a cinematic AI video model for high-fidelity motion, precise camera control, and structured visual storytelling. It suits cinematic storytelling, product demos, educational content, and scene-based narratives. Describe the scene and environment, specify actions, and add camera movement (pan, dolly, static). Use temporal language (slow, deliberate movement) for more natural motion. Clarity and intent matter more than length.

Text-to-Video and Image-to-Video

Veo 3.1 Quality is a cinematic AI video model for high-fidelity motion, precise camera control, and structured visual storytelling. It suits cinematic storytelling, product demos, educational content, and scene-based narratives. Describe the scene and environment, specify actions, and add camera movement (pan, dolly, static). Use temporal language (slow, deliberate movement) for more natural motion. Clarity and intent matter more than length.

Available Resolutions:

720p

1080p

Available Durations:

4 seconds

6 seconds

8 seconds

Credit Costs:

720p:4s = 25 credits6s = 25 credits8s = 25 credits

1080p:4s = 26 credits6s = 26 credits8s = 26 credits

Alibaba (Wan)

Wan 2.2 A14B Turbo

All paid plans

Fast Generation

Prototyping

WAN 2.2 is a text-to-video and image-to-video model with improved motion fidelity, prompt adherence, and cinematic lighting. It is tuned for short, emotion-driven videos—joy, calm, healing, inspiration—where mood, motion, and atmosphere matter more than long-form narrative. Use clear visual details and atmosphere words (warm, gentle, cinematic) for best results.

Text-to-Video and Image-to-Video

WAN 2.2 is a text-to-video and image-to-video model with improved motion fidelity, prompt adherence, and cinematic lighting. It is tuned for short, emotion-driven videos—joy, calm, healing, inspiration—where mood, motion, and atmosphere matter more than long-form narrative. Use clear visual details and atmosphere words (warm, gentle, cinematic) for best results.

Available Resolutions:

480p

720p

Available Durations:

5 seconds

Credit Costs:

480p:5s = 4 credits

720p:5s = 8 credits

Alibaba (Wan)

Wan 2.5

All paid plans

Detailed Scenes

Storytelling

WAN 2.5 is a fast text-to-video and image-to-video model aimed at beginners and high-frequency creators. It produces short-form videos suited for social media, marketing, and educational content. Keep prompts simple and descriptive. Best for TikTok, Instagram, and YouTube Shorts. Add narration or music in post-production.

Text-to-Video and Image-to-Video

WAN 2.5 is a fast text-to-video and image-to-video model aimed at beginners and high-frequency creators. It produces short-form videos suited for social media, marketing, and educational content. Keep prompts simple and descriptive. Best for TikTok, Instagram, and YouTube Shorts. Add narration or music in post-production.

Available Resolutions:

720p

1080p

Available Durations:

5 seconds

10 seconds

Credit Costs:

720p:5s = 6 credits10s = 12 credits

1080p:5s = 10 credits10s = 20 credits

Alibaba (Wan)

Wan 2.6

All paid plans

Extended Duration

Storytelling

Text-to-Video and Image-to-Video

Available Resolutions:

720p

1080p

Available Durations:

5 seconds

10 seconds

15 seconds

Credit Costs:

720p:5s = 7 credits10s = 14 credits15s = 21 credits

1080p:5s = 11 credits10s = 22 credits15s = 33 credits

Alibaba (Wan)

Wan 2.6 Video-to-Video

All paid plans

Style Transfer

Reference Video

Available Resolutions:

720p

1080p

Available Durations:

5 seconds

10 seconds

Credit Costs:

720p:5s = 7 credits10s = 14 credits

1080p:5s = 11 credits10s = 22 credits

Alibaba (Wan)

Wan 2.7

All paid plans

Long-form

Reference-Ready

Storytelling

Wan 2.7 is Alibaba's latest text-to-video and image-to-video model with extended duration support and improved prompt adherence. It suits longer-form, structured storytelling and benefits from clear visual detail and atmosphere words. Audio is added in post-production.

Text-to-Video and Image-to-Video

Wan 2.7 is Alibaba's latest text-to-video and image-to-video model with extended duration support and improved prompt adherence. It suits longer-form, structured storytelling and benefits from clear visual detail and atmosphere words. Audio is added in post-production.

Available Resolutions:

720p

1080p

Available Durations:

5 seconds

10 seconds

15 seconds

Credit Costs:

720p:5s = 8 credits10s = 16 credits15s = 24 credits

1080p:5s = 12 credits10s = 24 credits15s = 36 credits

AI Model Specifications

Quick model facts

Which AI video models does ClipStudios include?

Which model is best for marketing videos?

Which model is best for cinematic storytelling?

Which model includes native audio generation?

What's the difference between Kling 2.6 and Veo 3.1?

ByteDance Seedance Lite

Available Resolutions:

Available Durations:

ByteDance Seedance Pro

Available Resolutions:

Available Durations:

ByteDance Seedance Pro Fast

Available Resolutions:

Available Durations:

Kling 2.6

Available Resolutions:

Available Durations:

Kling 3.0

Available Resolutions:

Available Durations:

Kling v2.1 Master

Available Resolutions:

Available Durations:

Kling v2.1 Pro

Available Resolutions:

Available Durations:

Kling v2.1 Standard

Available Resolutions:

Available Durations:

Kling v2.5 Turbo Pro

Available Resolutions:

Available Durations:

Runway

Available Resolutions:

Available Durations:

Seedance 1.5 Pro

Available Resolutions:

Available Durations:

Veo 3.1 Fast

Available Resolutions:

Available Durations:

Veo 3.1 Lite

Available Resolutions:

Available Durations:

Veo 3.1 Quality

Available Resolutions:

Available Durations:

Wan 2.2 A14B Turbo

Available Resolutions:

Available Durations:

Wan 2.5

Available Resolutions:

Available Durations:

Wan 2.6

Available Resolutions:

Available Durations:

Wan 2.6 Video-to-Video

Available Resolutions:

Available Durations:

Wan 2.7

Available Resolutions:

Available Durations:

Not Sure Which Model to Choose?