Case 02

Motion Control workflow — 84% cost reduction versus premium video AI

We replaced premium motion control services with an open-source ComfyUI workflow. About $12,000 in annual savings per client at production scale — plus capabilities premium services don't have.

Role: RTP Agency·Timeline: 4–5 months in production·Status: Live with 2 commercial clients
Kling 2.6 (premium)$1.20 / video
Custom Wan 2.2 workflow$0.19 / video
84% on costs

The business problem

A digital content agency needed to produce video at industrial scale — hundreds and thousands of clips per month. They were considering premium video AI services (Kling 2.6 and similar) for motion control video generation, where the movements of a source video are transferred onto a target character.

The economics were brutal:

  • Premium services charge $0.21–$1.20 per generation for motion control (3.5–20 credits at ~$0.06–0.08 per credit)
  • At their volume (1000+ videos per month, target of 100 videos per hour during production sprints) this added up to thousands of dollars per month just for AI generation
  • Credit limits prevented scaling to real output needs
  • Premium services' content policies restricted what could be generated at all

They needed industrial-scale video generation that was at once radically cheaper and operationally flexible.

What was hard

Motion control is non-trivial to reproduce. The technology requires:

  • Skeleton/pose detection from the source video
  • Character segmentation that works accurately with complex motion
  • Motion transfer that preserves both the action and visual coherence
  • Background and context handling so the result looks natural

Most premium services (Kling, Hailuo, RunwayML) made motion control a proprietary feature and charge accordingly. Open-source equivalents existed, but were either broken, hard to find, or required deep ComfyUI expertise to bring to production.

Our approach

After extensive research and testing, we determined that Wan 2.2 — an older but underrated open-source model — was capable of matching premium motion control quality with the right ComfyUI workflow architecture.

The challenge: existing workflows were either broken or required manual segmentation (manually marking where the character is on each frame — completely impractical at scale).

Iteration 1

We got a broken workflow stuffed with obscure models and unused LoRAs. We trimmed it down to the working components, but segmentation still required manual frame-by-frame labeling. A non-viable option at production scale.

Iteration 2

After additional research we found a better workflow with automatic segmentation models. We customized and stabilized it for production. This became the production version.

Current refinements

  • Built in a video upscaling sub-workflow to improve quality
  • Added frame interpolation (smooth 30fps → 60fps output)
  • Built around the RunningHub API with parallel processing across multiple keys
  • Handled edge cases (object discrepancies between the source motion and the target character)

Production architecture

  • ComfyUI workflow on RunningHub GPU compute
  • An RTX 5080-class GPU is enough for the load (premium hardware isn't needed)
  • 5 parallel jobs per API key, a multi-key scheme to scale beyond a single account's limits
  • Generation time: ~20 minutes of compute per video on the standard plan
  • Built into a broader content pipeline (as a module inside a large automated content production system)
  • Accessible through several interfaces — Telegram bots, a web interface, or ComfyUI directly for advanced users

Capability comparison: not just cheaper, but different capabilities

Beyond cost, premium services have hard technical limits that constrain commercial use:

Premium service limitations (Kling 2.6 Motion Control)

  • Maximum 30 seconds per single continuous generation
  • Credit consumption grows with duration (longer = exponentially more expensive)
  • Content-policy restrictions on a range of commercial scenarios

Our implementation

  • No hard duration limit — video length is constrained only by available GPU time
  • You can generate 1, 2, 10+ minute videos in a single continuous generation
  • The same per-second cost economics, scaling linearly with duration
  • No content-policy friction for legitimate commercial work
For long-form content this isn't an optimization — it's a capability gap that premium services simply don't close.

Cost engineering — the math

RunningHub pricing structure

  • $0.0004 per coin
  • 24 coins per minute of GPU time
  • ~$0.01 per minute of compute

Cost per video for a typical 30-second clip

20 minutes of GPU time per video → 480 coins → ~$0.19 per video

Comparison with Kling 2.6 motion control (the same 30-second video)

15–20 credits per generation × $0.06–0.08 per credit → ~$0.90–$1.60 per video (midpoint ~$1.20)

At the client's real production volume

The drop in cost per video is the headline, but the cumulative value comes from three mutually reinforcing factors: an 84% cost reduction, the removal of duration limits, opening up content formats competitors can't produce, and operational flexibility from parallel processing across multiple keys.

Quality comparison

The honest answer: quality is on par with Kling for the production scenario, and better in places.

Where premium services win slightly: edge cases with unusual objects (for example, the source video shows a person holding a box but the target character doesn't have one — both systems can produce artifacts here, resolved by pre-editing the source image).

Where our implementation is on par or better: standard motion transfer scenarios, which are 95%+ of production volume.

Both hallucinate occasionally. This is expected behavior for the current generation of video AI — neither premium nor open source is free of hallucinations.

Expertise gained

On this project we built up deep expertise in:

  • ComfyUI workflow architecture — including debugging, library management, and the ComfyUI Manager ecosystem
  • The capabilities of open-source video models — in particular, the strengths and weaknesses of Wan 2.2 (excellent for motion transfer, weaker for generation from scratch)
  • GPU resource optimization — achieving production quality on consumer GPUs instead of enterprise hardware
  • Video post-processing integration — upscaling and frame interpolation built into the core generation workflow
  • Production stabilization — dealing with inevitable breakages when custom-node maintainers move repositories, model versions go stale, and so on

The result

84%
Cost reduction at production scale
~$12K
Annual savings per client
~$0.19
Cost per video for a 30-sec clip
100+/hr
Industrial throughput target
  • 4–5 months of continuous operation in production with 2 commercial clients in active content production
  • Industrial output — supporting a target throughput of 100+ videos per hour
  • Capabilities beyond premium services — no 30-second limit on video length
  • An integrated foundation for a broader automated content production pipeline
  • Operational flexibility — no content-policy restrictions or credit limits beyond infrastructure capacity

Technology stack

AI modelWan 2.2 (open source)
Workflow engineComfyUI
SegmentationAutomatic segmentation models
GPU computeRunningHub (RTX 5080 class)
Video processingFFmpeg
Post-processingUpscaling · Frame interpolation

What this demonstrates

  • Deep open-source AI expertise — finding, debugging, and bringing to production workflows that are undocumented and little-known
  • Cost-arbitrage thinking — seeing when premium services charge dollars for capabilities open source delivers for cents
  • Capability-gap hunting — finding business value in what premium services don't provide at all (long-form motion control)
  • Production engineering — turning broken or impractical workflows into industrially reliable ones
  • Workflow architecture — chaining multiple processing stages (motion control + segmentation + upscaling + interpolation) into coherent production pipelines
  • GPU compute optimization — production results on consumer-grade hardware

Similar challenge?

Tell us what you're building — we'd be glad to talk it through.

Let's talk