PixVerse Raises $300M: You Can "Direct" AI Video While It's Being Generated

While AI video tools like Sora 2, Seedance 2.0, and Kling 3.0 race on quality and length, a Chinese startup is redefining the game: PixVerse — a tool that lets you control a video as it’s being generated, like a real film director. On March 11, 2026, PixVerse announced a $300M Series C led by CDH Investments, surpassing a $1B valuation to become a unicorn. With Alibaba backing and proprietary real-time generation tech, PixVerse is opening a new paradigm: interactive AI video — where you don’t just create videos, you "live" inside them as they’re made.

PixVerseAI videoreal-time generationunicorn
Cover image: PixVerse Raises $300M: You Can "Direct" AI Video While It's Being Generated
Avatar of Trung Vũ Hoàng

Trung Vũ Hoàng

Author

23/3/202615 min read

What Is PixVerse? Real-Time AI Video Generation

Definition

PixVerse is an AI video generation platform developed by Aishi Technology—a Beijing startup founded in 2023. The biggest difference: PixVerse doesn’t generate the video and then show you the result. Instead, you watch the video being created in real time and can control it during generation.

Example workflow:

Traditional AI video (Sora, Seedance):
1. Enter prompt: "A woman walking in the park"
2. Wait 60–120 seconds
3. Review the result
4. If you don’t like it → Start over

PixVerse real-time:
1. Enter prompt: "A woman walking in the park"
2. Video starts generating instantly
3. At 2s: You see the woman appear
4. At 3s: You command "smile" → She smiles
5. At 5s: You command "wave" → She waves
6. At 8s: You command "sit down" → She sits down
7. Video finishes with exactly what you want

Real-Time Generation Technology

PixVerse uses an autoregressive diffusion architecture optimized for streaming generation. Instead of creating the entire video at once, the model generates frame by frame and streams it immediately.

Latency comparison:

Platform

Time to first frame

Total generation time (10s video)

Interactive?

Sora 2

~120 seconds

~120 seconds

Seedance 2.0

~60 seconds

~60 seconds

Kling 3.0

~45 seconds

~45 seconds

PixVerse v5.6

~0.5 seconds

~12 seconds

Trade-off: PixVerse is much faster and interactive, but image quality is lower than Kling 3.0 (1080p vs 4K) and it lacks native audio like Seedance 2.0.

Exclusive Feature: Interactive Commands

Supported Commands

While the video is being generated, you can type commands to control the character:

Emotion commands:

  • "smile" - Smile

  • "cry" - Cry

  • "angry" - Angry

  • "surprised" - Surprised

  • "sad" - Sad

Action commands:

  • "wave" - Wave

  • "dance" - Dance

  • "sit down" - Sit down

  • "stand up" - Stand up

  • "walk forward" - Walk forward

  • "turn around" - Turn around

Camera commands:

  • "zoom in" - Zoom in

  • "zoom out" - Zoom out

  • "pan left" - Pan left

  • "pan right" - Pan right

  • "close-up" - Close-up

Real-World Use Cases

1. Micro-dramas (interactive short films):

Jaden Xie, co-founder of PixVerse, explains: "Real-time generation can enable micro-dramas that users can steer—like 'choose your own adventure' books but with video."

Example:

Scene: Character stands before two doors
User command: "open left door"
→ Video continues with the character opening the left door
→ Finds treasure
User command: "pick up treasure"
→ Video continues with the character picking up the treasure
→ Monster appears
User command: "run away"
→ Video continues with a chase scene

2. Infinite games:

Games without a fixed storyline—AI creates content in real time based on player actions.

3. Interactive ads:

Ads that let viewers control the character, boosting engagement.

PixVerse v5.6: Latest Features

End-Frame Control

Version 5.6 (released January 2026) adds end-frame control—letting you specify the final frame of the video.

Workflow:

  1. Upload start image: Character standing

  2. Upload end image: Character sitting

  3. Prompt: "Smooth transition"

  4. PixVerse generates a transition video from standing to sitting

Use cases:

  • Animation: Create smooth transitions between keyframes

  • Product demos: Product moving from angle A to angle B

  • Character animation: Character from pose A to pose B

Portrait Mode Support

v5.6 natively supports portrait mode (1080×1920) — perfect for TikTok, Instagram Reels, YouTube Shorts.

Specs:

Aspect ratio

Resolution

FPS

Duration

16:9 (landscape)

1920×1080

24-30

5-20s

9:16 (portrait)

1080×1920

24-30

5-20s

1:1 (square)

1080×1080

24-30

5-20s

Comparison With Competitors

Detailed Comparison Table

Feature

PixVerse v5.6

Sora 2

Seedance 2.0

Kling 3.0

Real-time generation

Interactive control

✅ (cry, dance, pose)

Time to first frame

0.5s

120s

60s

45s

Max resolution

1080p

1080p

2K

4K

Max duration

20s

25s

15s

15s (stitched 60s+)

Native audio

Limited

Partial

End-frame control

Multi-shot

✅ (6 shots)

Pricing

$9.99-$29.99/month

$20-$200/month

$19.90-$99/month

Free-$92/month

Who Wins Each Category?

  • Real-time generation: PixVerse (exclusive)

  • Interactive control: PixVerse (exclusive)

  • Image quality: Kling 3.0 (4K/60fps)

  • Native audio: Seedance 2.0

  • Multi-shot storytelling: Seedance 2.0 and Kling 3.0

  • Lowest price: Kling 3.0 (has a free tier)

  • Easiest to use: PixVerse (real-time feedback)

$300M Funding Round: Deal Breakdown

Deal Terms

Info

Details

Round

Series C

Amount

$300 million USD

Valuation

$1B+ (unicorn status)

Lead investor

CDH Investments

Strategic investor

Alibaba

Announcement date

March 11, 2026

Use of funds

R&D, US expansion, hiring

Why Did Alibaba Invest?

Alibaba has a strategic interest in AI video:

  • Taobao/Tmall: E-commerce platforms need product videos

  • Youku: Video platform needs content generation tools

  • AliExpress: International e-commerce needs localized videos

  • Cloud business: Alibaba Cloud could offer a PixVerse API

Synergies:

  • PixVerse can integrate into Taobao so sellers can self-produce product videos

  • Alibaba Cloud can host PixVerse infrastructure

  • Cross-promotion across the Alibaba ecosystem (500M+ users)

Technology Deep Dive

Autoregressive Diffusion Model

PixVerse uses a hybrid architecture combining autoregressive and diffusion:

Autoregressive component:

  • Generates each frame based on previous frames

  • Enables streaming generation (no need to wait for the entire video)

  • Enables interactive control (can change direction mid-generation)

Diffusion component:

  • Ensures high image quality

  • Smooth transitions between frames

  • Consistent character appearance

Latency Optimization

To achieve real-time generation (< 1s latency), PixVerse optimized several aspects:

Optimization

Technique

Latency reduction

Model size

Distillation (14B → 7B params)

-40%

Inference

TensorRT optimization

-30%

Batching

Dynamic batching

-20%

Caching

KV-cache reuse

-25%

Hardware

Nvidia H100 GPUs

Baseline

Result: From ~5s latency (original model) down to ~0.5s (production).

Use Cases and Target Markets

1. Social Media Content Creators

Pain point: TikTokers and YouTubers need to produce 5–10 videos/day. Shooting and editing takes 2–3 hours/video.

Solution with PixVerse:

  • Create a 10-second video in 15 seconds

  • Interactive control to adjust content in real time

  • Native portrait mode for TikTok/Reels

  • Cost: $29.99/month (unlimited videos)

ROI:

Before PixVerse:
- 5 videos/day × 2 hours = 10 hours/day
- Opportunity cost: 10 hours × $50/hour = $500/day

With PixVerse:
- 5 videos/day × 5 minutes = 25 minutes/day
- Time saved: 9.5 hours × $50 = $475/day
- PixVerse cost: $29.99/month = ~$1/day
- Net benefit: $474/day = $14,220/month

2. Gaming: Procedural Cutscenes

Use case: Games with branching storylines need many different cutscenes. Handcrafting them is expensive.

Solution:

  • PixVerse API generates cutscenes in real time based on player choices

  • Each playthrough features different cutscenes

  • Infinite replayability

Game example:

Player chooses: "Save the princess"
→ PixVerse generates cutscene: Hero rescues the princess
→ Princess says: "Thank you!"

Player chooses: "Join the villain"
→ PixVerse generates cutscene: Hero joins the dark side
→ Villain says: "Welcome to the team!"

3. E-commerce: Product Videos At Scale

Pain point: E-commerce stores with 1,000+ products need product videos. Manual shooting isn’t feasible.

Solution:

  • Upload product images

  • PixVerse generates 360° rotation videos

  • Interactive: Viewers can command "zoom in", "rotate"

  • Produce 1,000 videos in a day

AI Video Market: $15B by 2030

Market Size

Year

Market size

Growth

Key segment

2024

$2.5B

-

Early adopters

2025

$4.8B

+92%

Content creators

2026

$8.2B

+71%

E-commerce, ads

2027

$11.5B

+40%

Gaming, education

2030

$15.0B

CAGR 43%

Mainstream

Competitors and Market Share

Company

Valuation

Funding

Market share

Differentiation

PixVerse

$1B+

$300M

8%

Real-time interactive

Runway

$1.5B

$237M

12%

Professional tools

Pika Labs

$500M

$135M

10%

Ease of use

Stability AI

$1B

$200M

5%

Open source

OpenAI (Sora)

$110B

N/A

15%

Quality, brand

ByteDance (Seedance)

N/A

N/A

20%

TikTok integration

Kuaishou (Kling)

N/A

N/A

18%

4K quality, price

Others

-

-

12%

-

Business Model and Pricing

Subscriptions

Plan

Price/month

Credits

Features

Free

$0

50 credits

720p, watermark, 5s max

Basic

$9.99

500 credits

1080p, no watermark, 10s max

Pro

$29.99

Unlimited

1080p, priority queue, 20s max, API access

Credit system:

  • 1 video 5s = 10 credits

  • 1 video 10s = 20 credits

  • 1 video 20s = 40 credits

  • Interactive commands: +2 credits per command

API Pricing (For Developers)

Tier

Price

Quota

Starter

$99/month

1,000 videos

Growth

$499/month

10,000 videos

Enterprise

Custom

Unlimited

Case Study: TikTok Creator Boosts Views by 300%

Background

A TikTok creator in Vietnam (200K followers) specializing in comedy skits. Previously shot and edited 3 videos/day, each taking 2 hours.

Workflow With PixVerse

Before (live-action):

  1. Write script: 30 minutes

  2. Set up camera and lighting: 20 minutes

  3. Shoot (5–10 takes): 40 minutes

  4. Edit in CapCut: 30 minutes

  5. Total: 2 hours/video

After (PixVerse):

  1. Write script: 30 minutes

  2. Create video with PixVerse: 5 minutes

  3. Interactive adjustments: 5 minutes

  4. Export and upload: 2 minutes

  5. Total: 42 minutes/video

Results After 2 Months

Metric

Before

After

Change

Videos/day

3

8

+167%

Average views/video

50K

150K

+200%

Followers

200K

580K

+190%

Monthly income

$2,000

$8,500

+325%

PixVerse cost

-

$29.99

-

Why did views increase?

  • Consistency: 8 videos/day instead of 3 → favored by the algorithm

  • Quality: AI-generated videos offer better visual effects

  • Variety: Able to test many different styles

  • Speed: Can capitalize on trending topics faster

Challenges and Limitations

1. Image Quality

PixVerse caps at 1080p, lower than Kling 3.0 (4K) and Seedance 2.0 (2K). This limits professional use cases.

Roadmap:

  • v6.0 (Q3 2026): 2K support

  • v7.0 (Q1 2027): 4K support

2. No Native Audio

PixVerse produces silent videos; users must add audio afterward. This adds a step and reduces convenience.

Workaround:

  • Integrate with ElevenLabs for AI voice

  • Integrate with Suno for AI music

  • Roadmap: Native audio in v6.0

3. Limited Control vs. Traditional Editing

Interactive commands are good but still limited compared to frame-by-frame editing in After Effects or Premiere Pro.

Trade-off:

  • PixVerse: Fast (5 minutes), easy (no skills needed), but less control

  • Traditional: Slow (2 hours), hard (requires skills), but full control

The Future: Interactive Video Everywhere

PixVerse’s Vision

Jaden Xie, co-founder, shares the vision:

"We believe the future of video isn’t passive consumption—it’s interactive experiences. Every video will be controllable, every story will branch, and every character will respond to user input."

Roadmap 2026-2027

Q2 2026:

  • Voice commands (speak instead of type)

  • Multi-character control (control multiple characters simultaneously)

  • Scene transitions (smooth transitions)

Q4 2026:

  • Native audio generation

  • 2K resolution support

  • Longer duration (up to 60s)

2027:

  • VR/AR support (interactive 360° videos)

  • Real-time collaboration (multiple users co-direct a video)

  • AI director mode (AI suggests commands based on the story)

Conclusion: Real-Time Is the Future

PixVerse has proved an important point: For many use cases, speed and interactivity matter more than quality. TikTok creators don’t need 4K—they need to create fast. Gamers don’t need perfect cinematography—they need dynamic cutscenes.

Lessons for the AI video industry:

  • Latency matters: 0.5s vs 60s is the difference between interactive and batch processing

  • Control matters: Users want to steer AI, not just prompt and hope

  • Integration matters: APIs for developers are as important as the UI for end users

Prediction: By 2027, every AI video platform will have a real-time mode. PixVerse has a first-mover advantage, but Sora, Seedance, and Kling will catch up. The race will be about who has the lowest latency and the best control.

Found this article helpful?

Contact us for a free consultation about our services

Contact us

Bài viết liên quan

Ảnh bìa bài viết: 12+ AI Models in 7 Days: March 2026's "AI Avalanche" Changes Everything
Công nghệ

12+ AI Models in 7 Days: March 2026's "AI Avalanche" Changes Everything

The first week of March 2026 (Mar 1-8) saw one of the densest waves of AI model releases ever: over 12 major models and tools from OpenAI, Alibaba, Lightricks, Tencent, Meta, ByteDance, and top universities. This wasn’t a normal week — it was an 'AI avalanche' spanning language models, video generation, image editing, 3D encoding, and GPU optimization. Notably, open-source models now rival or surpass proprietary alternatives across many domains. GPT-5.4 with a 1M-token context window, LTX 2.3 generating 4K video with audio, Helios producing real-time 1-minute videos, and Qwen 3.5’s 9B model matching 120B-class models — all in a single week. Here’s the full analysis.

23/3/2026
Ảnh bìa bài viết: PixVerse Raises $300M: You Can "Direct" AI Video While It's Being Generated
Công nghệ

PixVerse Raises $300M: You Can "Direct" AI Video While It's Being Generated

While AI video tools like Sora 2, Seedance 2.0, and Kling 3.0 race on quality and length, a Chinese startup is redefining the game: PixVerse — a tool that lets you control a video as it’s being generated, like a real film director. On March 11, 2026, PixVerse announced a $300M Series C led by CDH Investments, surpassing a $1B valuation to become a unicorn. With Alibaba backing and proprietary real-time generation tech, PixVerse is opening a new paradigm: interactive AI video — where you don’t just create videos, you "live" inside them as they’re made.

23/3/2026
Ảnh bìa bài viết: Nvidia GTC 2026: The "Super Bowl of AI" is Happening Now - 1.6nm Chips Change Everything
Công nghệ

Nvidia GTC 2026: The "Super Bowl of AI" is Happening Now - 1.6nm Chips Change Everything

Right now, at the San Jose Convention Center in California, the most important tech event of 2026 is underway—Nvidia GTC 2026. CEO Jensen Huang promised to unveil "technology never before revealed" and "chips that will surprise the world." With Nvidia's market capitalization hitting a record $4.6 trillion USD, this isn't just a tech event—it's a moment that will shape the future of AI for the next decade. The 1.6nm Feynman chip, the Vera Rubin architecture, and the N1X AI PC Superchip will mark the transition from simple chatbots to fully autonomous AI systems—the era of "Agentic AI" has officially begun.

21/3/2026