For the best experience, it's better to use a desktop computer to view this website.

🏄 SURF: Signature-Retained Fast Video Generation

CVPR 2026

Kaixin Ding1, Xi Chen1, Sihui Ji1, Yuan Gao2, Liang Hou2, Xin Tao2, Pengfei Wan2, Hengshuang Zhao1

1The University of Hong Kong
2Kling Team, KuaishouTechnology

Please stay tuned for all videos loading...

TL;DR: High-resolution video generation is slow: for example, Wan 2.1 takes over 50 minutes to generate a 720p video. Existing acceleration methods often compromise model priors (layout, semantics, motion). We propose SURF, a two-stage framework: first, a fast low-resolution preview using a pretrained model; second, a Refiner to upscale while preserving priors. Key techniques include noise reshifting to reduce prior loss and shifting windows with careful training design. SURF is simple, efficient, and compatible with various base models, achieving 12.5× speedup for generating 5-second, 16fps, 720p Wan 2.1 videos and 8.7× speedup for generating 5-second, 24fps, 720p HunyuanVideo.

SURF teaser visual results
Teaser. SURF keeps high-resolution video generation fast while preserving the base model's layout, semantics, and motion priors. The examples show that the refined videos remain faithful to the prompt without paying the full inference cost.
SURF pipeline overview
Pipeline. The method first generates a low-resolution preview, then uses a Refiner to recover high-resolution details. Noise reshifting and shifting windows help the refinement stage scale efficiently without losing temporal consistency.