18+ AI Video Models

      AI Model Specifications

      ClipStudios includes 18+ AI video generation models — all available on every paid plan.

      • Every paid plan includes the full model catalog. Plans differ by monthly credits, not which models you can use.
      • Starter (€20/mo): 100 credits.
      • Plus (€40/mo): 250 credits, plus AI Effects Studio and Motion Control.
      • Pro (from €99/mo): 500 credits, plus Voice Studio Expressive V3 and the agentic Nova assistant.

      All paid plans include text-to-video, image-to-video, lip-sync, and commercial licensing with zero watermarks.

      Need help choosing? Take our quiz

      Quick model facts

      Filters:
      Credit costs are examples
      ByteDance

      ByteDance Seedance Lite

      All paid plans
      Budget-Friendly
      Simple Videos

      Visual Storytelling Powered by Intelligent Video Generation

      Text-to-Video and Image-to-Video

      Seedance 1.0 transforms your creative concepts into dynamic visual narratives with an intuitive understanding of story structure and motion choreography. Designed for creators who think in stories rather than technical specifications, this model generates cohesive video content that flows naturally from scene to scene, bridging the gap between imagination and polished visual output.

      Available Resolutions:

      480p
      720p
      1080p

      Available Durations:

      5 seconds
      10 seconds
      Credit Costs:
      480p:5s = 1 credits10s = 2 credits
      720p:5s = 1 credits10s = 2 credits
      ByteDance

      ByteDance Seedance Pro

      All paid plans
      Quality Balance
      Marketing

      Visual Storytelling Powered by Intelligent Video Generation

      Text-to-Video and Image-to-Video

      Seedance 1.0 transforms your creative concepts into dynamic visual narratives with an intuitive understanding of story structure and motion choreography. Designed for creators who think in stories rather than technical specifications, this model generates cohesive video content that flows naturally from scene to scene, bridging the gap between imagination and polished visual output.

      Available Resolutions:

      480p
      720p
      1080p

      Available Durations:

      5 seconds
      10 seconds
      Credit Costs:
      480p:5s = 2 credits10s = 4 credits
      720p:5s = 2 credits10s = 4 credits
      ByteDance

      ByteDance Seedance Pro Fast

      All paid plans
      Fast Premium
      Agencies

      Visual Storytelling Powered by Intelligent Video Generation

      Image-to-Video

      Seedance 1.0 transforms your creative concepts into dynamic visual narratives with an intuitive understanding of story structure and motion choreography. Designed for creators who think in stories rather than technical specifications, this model generates cohesive video content that flows naturally from scene to scene, bridging the gap between imagination and polished visual output.

      Available Resolutions:

      720p
      1080p

      Available Durations:

      5 seconds
      10 seconds
      Credit Costs:
      720p:5s = 2 credits10s = 4 credits
      1080p:5s = 2 credits10s = 4 credits
      Kling (Kuaishou)

      Kling 2.6

      All paid plans
      Native audio
      AI Audio
      Expressive
      Text-to-Video and Image-to-Video

      Available Resolutions:

      1080p

      Available Durations:

      5 seconds
      10 seconds
      Credit Costs:
      1080p:5s = 6 credits10s = 12 credits
      Kling (Kuaishou)

      Kling 3.0

      All paid plans
      Native audio
      AI Audio
      Action
      High Fidelity

      Kling 3.0 is Kuaishou's latest video model with native audio generation, stronger action physics, and improved temporal consistency. It excels at dynamic action sequences and dialogue-driven scenes with synchronized sound in a single pass. Use natural-language prompts describing subject, motion, scene, and camera.

      Text-to-Video and Image-to-Video

      Kling 3.0 is Kuaishou's latest video model with native audio generation, stronger action physics, and improved temporal consistency. It excels at dynamic action sequences and dialogue-driven scenes with synchronized sound in a single pass. Use natural-language prompts describing subject, motion, scene, and camera.

      Available Resolutions:

      720p
      1080p
      2160p

      Available Durations:

      3 seconds
      4 seconds
      5 seconds
      6 seconds
      7 seconds
      8 seconds
      9 seconds
      10 seconds
      11 seconds
      12 seconds
      13 seconds
      14 seconds
      15 seconds
      Credit Costs:
      720p:3s = 4 credits4s = 6 credits5s = 7 credits6s = 8 credits7s = 10 credits8s = 11 credits9s = 13 credits10s = 14 credits11s = 15 credits12s = 17 credits13s = 18 credits14s = 20 credits15s = 21 credits
      1080p:3s = 5 credits4s = 7 credits5s = 9 credits6s = 11 credits7s = 13 credits8s = 14 credits9s = 16 credits10s = 18 credits11s = 20 credits12s = 22 credits13s = 23 credits14s = 25 credits15s = 27 credits
      Kling (Kuaishou)

      Kling v2.1 Master

      All paid plans
      Studio Quality
      Enterprise

      Kling 2.1 Master is the enhanced Kling 2.1 variant with stronger prompt adherence and higher visual consistency. It outputs 1080p only and is best for final-quality outputs. It excels at realistic motion, scene consistency, and cinematic camera control. Use natural language and cinematic camera terms. Negative prompts are supported. You can generate audio and merge it with the video in post-production.

      Text-to-Video and Image-to-Video

      Kling 2.1 Master is the enhanced Kling 2.1 variant with stronger prompt adherence and higher visual consistency. It outputs 1080p only and is best for final-quality outputs. It excels at realistic motion, scene consistency, and cinematic camera control. Use natural language and cinematic camera terms. Negative prompts are supported. You can generate audio and merge it with the video in post-production.

      Available Resolutions:

      1080p

      Available Durations:

      5 seconds
      10 seconds
      Credit Costs:
      1080p:5s = 16 credits10s = 32 credits
      Kling (Kuaishou)

      Kling v2.1 Pro

      All paid plans
      Professional
      Commercial

      Kling 2.1 Pro is a high-quality text-to-video and image-to-video model known for realistic motion, scene consistency, and strong camera control. It excels at action sequences, nature and landscape videos, and scenes with natural physics and movement. Structure prompts as: Subject → Movement → Scene → Camera / Lighting / Atmosphere. Use natural language and cinematic camera terms (close-up, tracking shot, dolly). Supports prompts up to 2,500 characters. Negative prompts are supported. Audio must be added in post-production.

      Image-to-Video

      Kling 2.1 Pro is a high-quality text-to-video and image-to-video model known for realistic motion, scene consistency, and strong camera control. It excels at action sequences, nature and landscape videos, and scenes with natural physics and movement. Structure prompts as: Subject → Movement → Scene → Camera / Lighting / Atmosphere. Use natural language and cinematic camera terms (close-up, tracking shot, dolly). Supports prompts up to 2,500 characters. Negative prompts are supported. Audio must be added in post-production.

      Available Resolutions:

      1080p

      Available Durations:

      5 seconds
      10 seconds
      Credit Costs:
      1080p:5s = 5 credits10s = 10 credits
      Kling (Kuaishou)

      Kling v2.1 Standard

      All paid plans
      Social Media
      Quick Content

      Kling 2.1 is a high-quality text-to-video and image-to-video model focused on realistic motion, scene consistency, and strong camera control. It produces visually coherent videos with natural physics and movement, and works well for action sequences, nature and landscape content, and scenes that need smooth motion. It supports cinematic camera language (pan, dolly, tracking, handheld) and negative prompts, and works best with clear, natural-language prompts that describe subject, movement, scene, and lighting. Kling 2.1 does not generate audio; add music or voiceover in post-production.

      Image-to-Video

      Kling 2.1 is a high-quality text-to-video and image-to-video model focused on realistic motion, scene consistency, and strong camera control. It produces visually coherent videos with natural physics and movement, and works well for action sequences, nature and landscape content, and scenes that need smooth motion. It supports cinematic camera language (pan, dolly, tracking, handheld) and negative prompts, and works best with clear, natural-language prompts that describe subject, movement, scene, and lighting. Kling 2.1 does not generate audio; add music or voiceover in post-production.

      Available Resolutions:

      720p

      Available Durations:

      5 seconds
      10 seconds
      Credit Costs:
      720p:5s = 3 credits10s = 6 credits
      Kling (Kuaishou)

      Kling v2.5 Turbo Pro

      All paid plans
      Premium Quality
      Advertising

      Next-Generation Video Production with Fluid Motion Control

      Text-to-Video and Image-to-Video

      Kling 2.5 Turbo represents the latest innovation from Kuaishou, bringing refined text-to-video and image-to-video capabilities to your creative workflow. This iteration emphasises a better understanding of creative prompts, smoother motion transitions, and rock-solid consistency.

      Available Resolutions:

      1080p

      Available Durations:

      5 seconds
      10 seconds
      Credit Costs:
      1080p:5s = 5 credits10s = 10 credits
      Runway

      Runway

      All paid plans
      Creative
      Artistic

      Runway Gen-3 Alpha is a high-fidelity AI video model for cinematic motion, smooth transitions, and precise camera control. It supports text-to-video and image-to-video and excels at realistic movement, stylized visuals, and seamless scene continuity. Best for product reveals, atmospheric scenes, FPV fly-throughs, and tracking shots. Use direct, descriptive prompts that define motion and camera behavior. Shorter clips (4–6 seconds) work best. Does not generate audio.

      Text-to-Video and Image-to-Video

      Runway Gen-3 Alpha is a high-fidelity AI video model for cinematic motion, smooth transitions, and precise camera control. It supports text-to-video and image-to-video and excels at realistic movement, stylized visuals, and seamless scene continuity. Best for product reveals, atmospheric scenes, FPV fly-throughs, and tracking shots. Use direct, descriptive prompts that define motion and camera behavior. Shorter clips (4–6 seconds) work best. Does not generate audio.

      Available Resolutions:

      720p
      1080p

      Available Durations:

      5 seconds
      10 seconds
      Credit Costs:
      720p:5s = 2 credits10s = 4 credits
      1080p:5s = 3 credits
      ByteDance

      Seedance 1.5 Pro

      All paid plans
      Native audio
      AI Audio
      Complete Output
      Best Value

      Seedance 1.5 Pro is a motion-focused text-to-video and image-to-video model built for dynamic scenes, camera movement, and action. It supports multi-shot sequences with "Shot Switch" and optional AI-generated audio (sound effects and ambience) in one pass. Use motion-first prompts: subject, action, and camera. Adverbs like "slowly" or "quickly" control motion intensity. Negative prompts are not supported.

      Text-to-Video and Image-to-Video

      Seedance 1.5 Pro is a motion-focused text-to-video and image-to-video model built for dynamic scenes, camera movement, and action. It supports multi-shot sequences with "Shot Switch" and optional AI-generated audio (sound effects and ambience) in one pass. Use motion-first prompts: subject, action, and camera. Adverbs like "slowly" or "quickly" control motion intensity. Negative prompts are not supported.

      Available Resolutions:

      480p
      720p
      1080p

      Available Durations:

      4 seconds
      8 seconds
      12 seconds
      Credit Costs:
      480p:4s = 1 credits8s = 2 credits12s = 3 credits
      720p:4s = 2 credits8s = 4 credits12s = 6 credits
      Google

      Veo 3.1 Fast

      All paid plans
      Native audio
      High Quality
      Fast Turnaround

      Veo 3.1 Fast is a cinematic AI video model built for high-fidelity motion, precise camera control, and structured storytelling. It suits cinematic storytelling, product demos, educational content, and scene-based narratives. Describe the scene and environment, specify actions, and add camera movement (pan, dolly, static). Think in scenes, not single frames. More detail generally improves results.

      Text-to-Video and Image-to-Video

      Veo 3.1 Fast is a cinematic AI video model built for high-fidelity motion, precise camera control, and structured storytelling. It suits cinematic storytelling, product demos, educational content, and scene-based narratives. Describe the scene and environment, specify actions, and add camera movement (pan, dolly, static). Think in scenes, not single frames. More detail generally improves results.

      Available Resolutions:

      720p
      1080p

      Available Durations:

      4 seconds
      6 seconds
      8 seconds
      Credit Costs:
      720p:4s = 6 credits6s = 6 credits8s = 6 credits
      1080p:4s = 7 credits6s = 7 credits8s = 7 credits
      Google

      Veo 3.1 Lite

      Free & all paid plans
      Native audio
      Free Plan
      Quick Drafts
      Native Audio

      Veo 3.1 Lite is Google's efficient Veo variant — the model included on the Free plan. It generates cinematic motion with native audio at 720p and 1080p in 4, 6, or 8-second clips, making it ideal for quick drafts, social clips, and trying ideas before committing credits to Fast or Quality.

      Text-to-Video and Image-to-Video

      Veo 3.1 Lite is Google's efficient Veo variant — the model included on the Free plan. It generates cinematic motion with native audio at 720p and 1080p in 4, 6, or 8-second clips, making it ideal for quick drafts, social clips, and trying ideas before committing credits to Fast or Quality.

      Available Resolutions:

      720p
      1080p
      4k

      Available Durations:

      4 seconds
      6 seconds
      8 seconds
      Credit Costs:
      720p:4s = 3 credits6s = 3 credits8s = 3 credits
      1080p:4s = 4 credits6s = 4 credits8s = 4 credits
      Google

      Veo 3.1 Quality

      All paid plans
      Native audio
      Cinematic
      Premium Content

      Veo 3.1 Quality is a cinematic AI video model for high-fidelity motion, precise camera control, and structured visual storytelling. It suits cinematic storytelling, product demos, educational content, and scene-based narratives. Describe the scene and environment, specify actions, and add camera movement (pan, dolly, static). Use temporal language (slow, deliberate movement) for more natural motion. Clarity and intent matter more than length.

      Text-to-Video and Image-to-Video

      Veo 3.1 Quality is a cinematic AI video model for high-fidelity motion, precise camera control, and structured visual storytelling. It suits cinematic storytelling, product demos, educational content, and scene-based narratives. Describe the scene and environment, specify actions, and add camera movement (pan, dolly, static). Use temporal language (slow, deliberate movement) for more natural motion. Clarity and intent matter more than length.

      Available Resolutions:

      720p
      1080p

      Available Durations:

      4 seconds
      6 seconds
      8 seconds
      Credit Costs:
      720p:4s = 25 credits6s = 25 credits8s = 25 credits
      1080p:4s = 26 credits6s = 26 credits8s = 26 credits
      Alibaba (Wan)

      Wan 2.2 A14B Turbo

      All paid plans
      Fast Generation
      Prototyping

      WAN 2.2 is a text-to-video and image-to-video model with improved motion fidelity, prompt adherence, and cinematic lighting. It is tuned for short, emotion-driven videos—joy, calm, healing, inspiration—where mood, motion, and atmosphere matter more than long-form narrative. Use clear visual details and atmosphere words (warm, gentle, cinematic) for best results.

      Text-to-Video and Image-to-Video

      WAN 2.2 is a text-to-video and image-to-video model with improved motion fidelity, prompt adherence, and cinematic lighting. It is tuned for short, emotion-driven videos—joy, calm, healing, inspiration—where mood, motion, and atmosphere matter more than long-form narrative. Use clear visual details and atmosphere words (warm, gentle, cinematic) for best results.

      Available Resolutions:

      480p
      720p

      Available Durations:

      5 seconds
      Credit Costs:
      480p:5s = 4 credits
      720p:5s = 8 credits
      Alibaba (Wan)

      Wan 2.5

      All paid plans
      Detailed Scenes
      Storytelling

      WAN 2.5 is a fast text-to-video and image-to-video model aimed at beginners and high-frequency creators. It produces short-form videos suited for social media, marketing, and educational content. Keep prompts simple and descriptive. Best for TikTok, Instagram, and YouTube Shorts. Add narration or music in post-production.

      Text-to-Video and Image-to-Video

      WAN 2.5 is a fast text-to-video and image-to-video model aimed at beginners and high-frequency creators. It produces short-form videos suited for social media, marketing, and educational content. Keep prompts simple and descriptive. Best for TikTok, Instagram, and YouTube Shorts. Add narration or music in post-production.

      Available Resolutions:

      720p
      1080p

      Available Durations:

      5 seconds
      10 seconds
      Credit Costs:
      720p:5s = 6 credits10s = 12 credits
      1080p:5s = 10 credits10s = 20 credits
      Alibaba (Wan)

      Wan 2.6

      All paid plans
      Extended Duration
      Storytelling
      Text-to-Video and Image-to-Video

      Available Resolutions:

      720p
      1080p

      Available Durations:

      5 seconds
      10 seconds
      15 seconds
      Credit Costs:
      720p:5s = 7 credits10s = 14 credits15s = 21 credits
      1080p:5s = 11 credits10s = 22 credits15s = 33 credits
      Alibaba (Wan)

      Wan 2.6 Video-to-Video

      All paid plans
      Style Transfer
      Reference Video

      Available Resolutions:

      720p
      1080p

      Available Durations:

      5 seconds
      10 seconds
      Credit Costs:
      720p:5s = 7 credits10s = 14 credits
      1080p:5s = 11 credits10s = 22 credits
      Alibaba (Wan)

      Wan 2.7

      All paid plans
      Long-form
      Reference-Ready
      Storytelling

      Wan 2.7 is Alibaba's latest text-to-video and image-to-video model with extended duration support and improved prompt adherence. It suits longer-form, structured storytelling and benefits from clear visual detail and atmosphere words. Audio is added in post-production.

      Text-to-Video and Image-to-Video

      Wan 2.7 is Alibaba's latest text-to-video and image-to-video model with extended duration support and improved prompt adherence. It suits longer-form, structured storytelling and benefits from clear visual detail and atmosphere words. Audio is added in post-production.

      Available Resolutions:

      720p
      1080p

      Available Durations:

      5 seconds
      10 seconds
      15 seconds
      Credit Costs:
      720p:5s = 8 credits10s = 16 credits15s = 24 credits
      1080p:5s = 12 credits10s = 24 credits15s = 36 credits

      Not Sure Which Model to Choose?

      Take our quick quiz to find the perfect model and plan for your needs.

      We use analytics to improve your experience. See our Privacy Policy.