Case 02

Motion Control workflow — 84% cost reduction versus premium video AI

We replaced premium motion control services with an open-source ComfyUI workflow. About $12,000 in annual savings per client at production scale — plus capabilities premium services don't have.

Role: RTP Agency·Timeline: 4–5 months in production·Status: Live with 2 commercial clients

Kling 2.6 (premium)$1.20 / video

Custom Wan 2.2 workflow$0.19 / video

−84% on costs

The business problem

A digital content agency needed to produce video at industrial scale — hundreds and thousands of clips per month. They were considering premium video AI services (Kling 2.6 and similar) for motion control video generation, where the movements of a source video are transferred onto a target character.

The economics were brutal:

Premium services charge $0.21–$1.20 per generation for motion control (3.5–20 credits at ~$0.06–0.08 per credit)
At their volume (1000+ videos per month, target of 100 videos per hour during production sprints) this added up to thousands of dollars per month just for AI generation
Credit limits prevented scaling to real output needs
Premium services' content policies restricted what could be generated at all

They needed industrial-scale video generation that was at once radically cheaper and operationally flexible.

What was hard

Motion control is non-trivial to reproduce. The technology requires:

Skeleton/pose detection from the source video
Character segmentation that works accurately with complex motion
Motion transfer that preserves both the action and visual coherence
Background and context handling so the result looks natural

Most premium services (Kling, Hailuo, RunwayML) made motion control a proprietary feature and charge accordingly. Open-source equivalents existed, but were either broken, hard to find, or required deep ComfyUI expertise to bring to production.

Our approach

After extensive research and testing, we determined that Wan 2.2 — an older but underrated open-source model — was capable of matching premium motion control quality with the right ComfyUI workflow architecture.

The challenge: existing workflows were either broken or required manual segmentation (manually marking where the character is on each frame — completely impractical at scale).

Iteration 1

We got a broken workflow stuffed with obscure models and unused LoRAs. We trimmed it down to the working components, but segmentation still required manual frame-by-frame labeling. A non-viable option at production scale.

Iteration 2

After additional research we found a better workflow with automatic segmentation models. We customized and stabilized it for production. This became the production version.

Current refinements

Built in a video upscaling sub-workflow to improve quality
Added frame interpolation (smooth 30fps → 60fps output)
Built around the RunningHub API with parallel processing across multiple keys
Handled edge cases (object discrepancies between the source motion and the target character)

Production architecture

ComfyUI workflow on RunningHub GPU compute
An RTX 5080-class GPU is enough for the load (premium hardware isn't needed)
5 parallel jobs per API key, a multi-key scheme to scale beyond a single account's limits
Generation time: ~20 minutes of compute per video on the standard plan
Built into a broader content pipeline (as a module inside a large automated content production system)
Accessible through several interfaces — Telegram bots, a web interface, or ComfyUI directly for advanced users

Capability comparison: not just cheaper, but different capabilities

Beyond cost, premium services have hard technical limits that constrain commercial use:

Premium service limitations (Kling 2.6 Motion Control)

Maximum 30 seconds per single continuous generation
Credit consumption grows with duration (longer = exponentially more expensive)
Content-policy restrictions on a range of commercial scenarios

Our implementation

No hard duration limit — video length is constrained only by available GPU time
You can generate 1, 2, 10+ minute videos in a single continuous generation
The same per-second cost economics, scaling linearly with duration
No content-policy friction for legitimate commercial work

For long-form content this isn't an optimization — it's a capability gap that premium services simply don't close.

Cost engineering — the math

RunningHub pricing structure

$0.0004 per coin
24 coins per minute of GPU time
~$0.01 per minute of compute

Cost per video for a typical 30-second clip

20 minutes of GPU time per video → 480 coins → ~$0.19 per video

Comparison with Kling 2.6 motion control (the same 30-second video)

15–20 credits per generation × $0.06–0.08 per credit → ~$0.90–$1.60 per video (midpoint ~$1.20)

At the client's real production volume

The drop in cost per video is the headline, but the cumulative value comes from three mutually reinforcing factors: an 84% cost reduction, the removal of duration limits, opening up content formats competitors can't produce, and operational flexibility from parallel processing across multiple keys.

Quality comparison

The honest answer: quality is on par with Kling for the production scenario, and better in places.

Where premium services win slightly: edge cases with unusual objects (for example, the source video shows a person holding a box but the target character doesn't have one — both systems can produce artifacts here, resolved by pre-editing the source image).

Where our implementation is on par or better: standard motion transfer scenarios, which are 95%+ of production volume.

Both hallucinate occasionally. This is expected behavior for the current generation of video AI — neither premium nor open source is free of hallucinations.

Expertise gained

On this project we built up deep expertise in:

ComfyUI workflow architecture — including debugging, library management, and the ComfyUI Manager ecosystem
The capabilities of open-source video models — in particular, the strengths and weaknesses of Wan 2.2 (excellent for motion transfer, weaker for generation from scratch)
GPU resource optimization — achieving production quality on consumer GPUs instead of enterprise hardware
Video post-processing integration — upscaling and frame interpolation built into the core generation workflow
Production stabilization — dealing with inevitable breakages when custom-node maintainers move repositories, model versions go stale, and so on

The result

84%

Cost reduction at production scale

~$12K

Annual savings per client

~$0.19

Cost per video for a 30-sec clip

100+/hr

Industrial throughput target

4–5 months of continuous operation in production with 2 commercial clients in active content production
Industrial output — supporting a target throughput of 100+ videos per hour
Capabilities beyond premium services — no 30-second limit on video length
An integrated foundation for a broader automated content production pipeline
Operational flexibility — no content-policy restrictions or credit limits beyond infrastructure capacity

Technology stack

AI model	Wan 2.2 (open source)
Workflow engine	ComfyUI
Segmentation	Automatic segmentation models
GPU compute	RunningHub (RTX 5080 class)
Video processing	FFmpeg
Post-processing	Upscaling · Frame interpolation

What this demonstrates

Deep open-source AI expertise — finding, debugging, and bringing to production workflows that are undocumented and little-known
Cost-arbitrage thinking — seeing when premium services charge dollars for capabilities open source delivers for cents
Capability-gap hunting — finding business value in what premium services don't provide at all (long-form motion control)
Production engineering — turning broken or impractical workflows into industrially reliable ones
Workflow architecture — chaining multiple processing stages (motion control + segmentation + upscaling + interpolation) into coherent production pipelines
GPU compute optimization — production results on consumer-grade hardware

Similar challenge?

Tell us what you're building — we'd be glad to talk it through.

Let's talk →

← Back: Lipsync system Next: Video localization →