PixVerse Raises $300M: You Can "Direct" AI Video While It's Being Generated
While AI video tools like Sora 2, Seedance 2.0, and Kling 3.0 race on quality and length, a Chinese startup is redefining the game: PixVerse — a tool that lets you control a video as it’s being generated, like a real film director. On March 11, 2026, PixVerse announced a $300M Series C led by CDH Investments, surpassing a $1B valuation to become a unicorn. With Alibaba backing and proprietary real-time generation tech, PixVerse is opening a new paradigm: interactive AI video — where you don’t just create videos, you "live" inside them as they’re made.

Trung Vũ Hoàng
Author
What Is PixVerse? Real-Time AI Video Generation
Definition
PixVerse is an AI video generation platform developed by Aishi Technology—a Beijing startup founded in 2023. The biggest difference: PixVerse doesn’t generate the video and then show you the result. Instead, you watch the video being created in real time and can control it during generation.
Example workflow:
Traditional AI video (Sora, Seedance):
1. Enter prompt: "A woman walking in the park"
2. Wait 60–120 seconds
3. Review the result
4. If you don’t like it → Start over
PixVerse real-time:
1. Enter prompt: "A woman walking in the park"
2. Video starts generating instantly
3. At 2s: You see the woman appear
4. At 3s: You command "smile" → She smiles
5. At 5s: You command "wave" → She waves
6. At 8s: You command "sit down" → She sits down
7. Video finishes with exactly what you wantReal-Time Generation Technology
PixVerse uses an autoregressive diffusion architecture optimized for streaming generation. Instead of creating the entire video at once, the model generates frame by frame and streams it immediately.
Latency comparison:
Platform | Time to first frame | Total generation time (10s video) | Interactive? |
|---|---|---|---|
Sora 2 | ~120 seconds | ~120 seconds | ❌ |
Seedance 2.0 | ~60 seconds | ~60 seconds | ❌ |
Kling 3.0 | ~45 seconds | ~45 seconds | ❌ |
PixVerse v5.6 | ~0.5 seconds | ~12 seconds | ✅ |
Trade-off: PixVerse is much faster and interactive, but image quality is lower than Kling 3.0 (1080p vs 4K) and it lacks native audio like Seedance 2.0.
Exclusive Feature: Interactive Commands
Supported Commands
While the video is being generated, you can type commands to control the character:
Emotion commands:
"smile" - Smile
"cry" - Cry
"angry" - Angry
"surprised" - Surprised
"sad" - Sad
Action commands:
"wave" - Wave
"dance" - Dance
"sit down" - Sit down
"stand up" - Stand up
"walk forward" - Walk forward
"turn around" - Turn around
Camera commands:
"zoom in" - Zoom in
"zoom out" - Zoom out
"pan left" - Pan left
"pan right" - Pan right
"close-up" - Close-up
Real-World Use Cases
1. Micro-dramas (interactive short films):
Jaden Xie, co-founder of PixVerse, explains: "Real-time generation can enable micro-dramas that users can steer—like 'choose your own adventure' books but with video."
Example:
Scene: Character stands before two doors
User command: "open left door"
→ Video continues with the character opening the left door
→ Finds treasure
User command: "pick up treasure"
→ Video continues with the character picking up the treasure
→ Monster appears
User command: "run away"
→ Video continues with a chase scene2. Infinite games:
Games without a fixed storyline—AI creates content in real time based on player actions.
3. Interactive ads:
Ads that let viewers control the character, boosting engagement.
PixVerse v5.6: Latest Features
End-Frame Control
Version 5.6 (released January 2026) adds end-frame control—letting you specify the final frame of the video.
Workflow:
Upload start image: Character standing
Upload end image: Character sitting
Prompt: "Smooth transition"
PixVerse generates a transition video from standing to sitting
Use cases:
Animation: Create smooth transitions between keyframes
Product demos: Product moving from angle A to angle B
Character animation: Character from pose A to pose B
Portrait Mode Support
v5.6 natively supports portrait mode (1080×1920) — perfect for TikTok, Instagram Reels, YouTube Shorts.
Specs:
Aspect ratio | Resolution | FPS | Duration |
|---|---|---|---|
16:9 (landscape) | 1920×1080 | 24-30 | 5-20s |
9:16 (portrait) | 1080×1920 | 24-30 | 5-20s |
1:1 (square) | 1080×1080 | 24-30 | 5-20s |
Comparison With Competitors
Detailed Comparison Table
Feature | PixVerse v5.6 | Sora 2 | Seedance 2.0 | Kling 3.0 |
|---|---|---|---|---|
Real-time generation | ✅ | ❌ | ❌ | ❌ |
Interactive control | ✅ (cry, dance, pose) | ❌ | ❌ | ❌ |
Time to first frame | 0.5s | 120s | 60s | 45s |
Max resolution | 1080p | 1080p | 2K | 4K |
Max duration | 20s | 25s | 15s | 15s (stitched 60s+) |
Native audio | ❌ | Limited | ✅ | Partial |
End-frame control | ✅ | ❌ | ❌ | ❌ |
Multi-shot | ❌ | ❌ | ✅ | ✅ (6 shots) |
Pricing | $9.99-$29.99/month | $20-$200/month | $19.90-$99/month | Free-$92/month |
Who Wins Each Category?
Real-time generation: PixVerse (exclusive)
Interactive control: PixVerse (exclusive)
Image quality: Kling 3.0 (4K/60fps)
Native audio: Seedance 2.0
Multi-shot storytelling: Seedance 2.0 and Kling 3.0
Lowest price: Kling 3.0 (has a free tier)
Easiest to use: PixVerse (real-time feedback)
$300M Funding Round: Deal Breakdown
Deal Terms
Info | Details |
|---|---|
Round | Series C |
Amount | $300 million USD |
Valuation | $1B+ (unicorn status) |
Lead investor | CDH Investments |
Strategic investor | Alibaba |
Announcement date | March 11, 2026 |
Use of funds | R&D, US expansion, hiring |
Why Did Alibaba Invest?
Alibaba has a strategic interest in AI video:
Taobao/Tmall: E-commerce platforms need product videos
Youku: Video platform needs content generation tools
AliExpress: International e-commerce needs localized videos
Cloud business: Alibaba Cloud could offer a PixVerse API
Synergies:
PixVerse can integrate into Taobao so sellers can self-produce product videos
Alibaba Cloud can host PixVerse infrastructure
Cross-promotion across the Alibaba ecosystem (500M+ users)
Technology Deep Dive
Autoregressive Diffusion Model
PixVerse uses a hybrid architecture combining autoregressive and diffusion:
Autoregressive component:
Generates each frame based on previous frames
Enables streaming generation (no need to wait for the entire video)
Enables interactive control (can change direction mid-generation)
Diffusion component:
Ensures high image quality
Smooth transitions between frames
Consistent character appearance
Latency Optimization
To achieve real-time generation (< 1s latency), PixVerse optimized several aspects:
Optimization | Technique | Latency reduction |
|---|---|---|
Model size | Distillation (14B → 7B params) | -40% |
Inference | TensorRT optimization | -30% |
Batching | Dynamic batching | -20% |
Caching | KV-cache reuse | -25% |
Hardware | Nvidia H100 GPUs | Baseline |
Result: From ~5s latency (original model) down to ~0.5s (production).
Use Cases and Target Markets
1. Social Media Content Creators
Pain point: TikTokers and YouTubers need to produce 5–10 videos/day. Shooting and editing takes 2–3 hours/video.
Solution with PixVerse:
Create a 10-second video in 15 seconds
Interactive control to adjust content in real time
Native portrait mode for TikTok/Reels
Cost: $29.99/month (unlimited videos)
ROI:
Before PixVerse:
- 5 videos/day × 2 hours = 10 hours/day
- Opportunity cost: 10 hours × $50/hour = $500/day
With PixVerse:
- 5 videos/day × 5 minutes = 25 minutes/day
- Time saved: 9.5 hours × $50 = $475/day
- PixVerse cost: $29.99/month = ~$1/day
- Net benefit: $474/day = $14,220/month2. Gaming: Procedural Cutscenes
Use case: Games with branching storylines need many different cutscenes. Handcrafting them is expensive.
Solution:
PixVerse API generates cutscenes in real time based on player choices
Each playthrough features different cutscenes
Infinite replayability
Game example:
Player chooses: "Save the princess"
→ PixVerse generates cutscene: Hero rescues the princess
→ Princess says: "Thank you!"
Player chooses: "Join the villain"
→ PixVerse generates cutscene: Hero joins the dark side
→ Villain says: "Welcome to the team!"3. E-commerce: Product Videos At Scale
Pain point: E-commerce stores with 1,000+ products need product videos. Manual shooting isn’t feasible.
Solution:
Upload product images
PixVerse generates 360° rotation videos
Interactive: Viewers can command "zoom in", "rotate"
Produce 1,000 videos in a day
AI Video Market: $15B by 2030
Market Size
Year | Market size | Growth | Key segment |
|---|---|---|---|
2024 | $2.5B | - | Early adopters |
2025 | $4.8B | +92% | Content creators |
2026 | $8.2B | +71% | E-commerce, ads |
2027 | $11.5B | +40% | Gaming, education |
2030 | $15.0B | CAGR 43% | Mainstream |
Competitors and Market Share
Company | Valuation | Funding | Market share | Differentiation |
|---|---|---|---|---|
PixVerse | $1B+ | $300M | 8% | Real-time interactive |
Runway | $1.5B | $237M | 12% | Professional tools |
Pika Labs | $500M | $135M | 10% | Ease of use |
Stability AI | $1B | $200M | 5% | Open source |
OpenAI (Sora) | $110B | N/A | 15% | Quality, brand |
ByteDance (Seedance) | N/A | N/A | 20% | TikTok integration |
Kuaishou (Kling) | N/A | N/A | 18% | 4K quality, price |
Others | - | - | 12% | - |
Business Model and Pricing
Subscriptions
Plan | Price/month | Credits | Features |
|---|---|---|---|
Free | $0 | 50 credits | 720p, watermark, 5s max |
Basic | $9.99 | 500 credits | 1080p, no watermark, 10s max |
Pro | $29.99 | Unlimited | 1080p, priority queue, 20s max, API access |
Credit system:
1 video 5s = 10 credits
1 video 10s = 20 credits
1 video 20s = 40 credits
Interactive commands: +2 credits per command
API Pricing (For Developers)
Tier | Price | Quota |
|---|---|---|
Starter | $99/month | 1,000 videos |
Growth | $499/month | 10,000 videos |
Enterprise | Custom | Unlimited |
Case Study: TikTok Creator Boosts Views by 300%
Background
A TikTok creator in Vietnam (200K followers) specializing in comedy skits. Previously shot and edited 3 videos/day, each taking 2 hours.
Workflow With PixVerse
Before (live-action):
Write script: 30 minutes
Set up camera and lighting: 20 minutes
Shoot (5–10 takes): 40 minutes
Edit in CapCut: 30 minutes
Total: 2 hours/video
After (PixVerse):
Write script: 30 minutes
Create video with PixVerse: 5 minutes
Interactive adjustments: 5 minutes
Export and upload: 2 minutes
Total: 42 minutes/video
Results After 2 Months
Metric | Before | After | Change |
|---|---|---|---|
Videos/day | 3 | 8 | +167% |
Average views/video | 50K | 150K | +200% |
Followers | 200K | 580K | +190% |
Monthly income | $2,000 | $8,500 | +325% |
PixVerse cost | - | $29.99 | - |
Why did views increase?
Consistency: 8 videos/day instead of 3 → favored by the algorithm
Quality: AI-generated videos offer better visual effects
Variety: Able to test many different styles
Speed: Can capitalize on trending topics faster
Challenges and Limitations
1. Image Quality
PixVerse caps at 1080p, lower than Kling 3.0 (4K) and Seedance 2.0 (2K). This limits professional use cases.
Roadmap:
v6.0 (Q3 2026): 2K support
v7.0 (Q1 2027): 4K support
2. No Native Audio
PixVerse produces silent videos; users must add audio afterward. This adds a step and reduces convenience.
Workaround:
Integrate with ElevenLabs for AI voice
Integrate with Suno for AI music
Roadmap: Native audio in v6.0
3. Limited Control vs. Traditional Editing
Interactive commands are good but still limited compared to frame-by-frame editing in After Effects or Premiere Pro.
Trade-off:
PixVerse: Fast (5 minutes), easy (no skills needed), but less control
Traditional: Slow (2 hours), hard (requires skills), but full control
The Future: Interactive Video Everywhere
PixVerse’s Vision
Jaden Xie, co-founder, shares the vision:
"We believe the future of video isn’t passive consumption—it’s interactive experiences. Every video will be controllable, every story will branch, and every character will respond to user input."
Roadmap 2026-2027
Q2 2026:
Voice commands (speak instead of type)
Multi-character control (control multiple characters simultaneously)
Scene transitions (smooth transitions)
Q4 2026:
Native audio generation
2K resolution support
Longer duration (up to 60s)
2027:
VR/AR support (interactive 360° videos)
Real-time collaboration (multiple users co-direct a video)
AI director mode (AI suggests commands based on the story)
Conclusion: Real-Time Is the Future
PixVerse has proved an important point: For many use cases, speed and interactivity matter more than quality. TikTok creators don’t need 4K—they need to create fast. Gamers don’t need perfect cinematography—they need dynamic cutscenes.
Lessons for the AI video industry:
Latency matters: 0.5s vs 60s is the difference between interactive and batch processing
Control matters: Users want to steer AI, not just prompt and hope
Integration matters: APIs for developers are as important as the UI for end users
Prediction: By 2027, every AI video platform will have a real-time mode. PixVerse has a first-mover advantage, but Sora, Seedance, and Kling will catch up. The race will be about who has the lowest latency and the best control.
Bài viết liên quan

12+ AI Models in 7 Days: March 2026's "AI Avalanche" Changes Everything
The first week of March 2026 (Mar 1-8) saw one of the densest waves of AI model releases ever: over 12 major models and tools from OpenAI, Alibaba, Lightricks, Tencent, Meta, ByteDance, and top universities. This wasn’t a normal week — it was an 'AI avalanche' spanning language models, video generation, image editing, 3D encoding, and GPU optimization. Notably, open-source models now rival or surpass proprietary alternatives across many domains. GPT-5.4 with a 1M-token context window, LTX 2.3 generating 4K video with audio, Helios producing real-time 1-minute videos, and Qwen 3.5’s 9B model matching 120B-class models — all in a single week. Here’s the full analysis.

PixVerse Raises $300M: You Can "Direct" AI Video While It's Being Generated
While AI video tools like Sora 2, Seedance 2.0, and Kling 3.0 race on quality and length, a Chinese startup is redefining the game: PixVerse — a tool that lets you control a video as it’s being generated, like a real film director. On March 11, 2026, PixVerse announced a $300M Series C led by CDH Investments, surpassing a $1B valuation to become a unicorn. With Alibaba backing and proprietary real-time generation tech, PixVerse is opening a new paradigm: interactive AI video — where you don’t just create videos, you "live" inside them as they’re made.

Nvidia GTC 2026: The "Super Bowl of AI" is Happening Now - 1.6nm Chips Change Everything
Right now, at the San Jose Convention Center in California, the most important tech event of 2026 is underway—Nvidia GTC 2026. CEO Jensen Huang promised to unveil "technology never before revealed" and "chips that will surprise the world." With Nvidia's market capitalization hitting a record $4.6 trillion USD, this isn't just a tech event—it's a moment that will shape the future of AI for the next decade. The 1.6nm Feynman chip, the Vera Rubin architecture, and the N1X AI PC Superchip will mark the transition from simple chatbots to fully autonomous AI systems—the era of "Agentic AI" has officially begun.