Physics-Guided Motion Loss for Video Generation Model cvml
Bowen Xue, Giuseppe Claudio Guarnera, Shuang Zhao, Zahra Montazeri
International Conference on Machine Learning (ICML), July 2026
Current video diffusion models generate visually compelling content but often violate basic laws of physics, producing subtle artifacts like rubber-sheet deformations and inconsistent object motion. We introduce a frequency-domain physics prior that improves motion plausibility without modifying model architectures. Our method decomposes common rigid motions (translation, rotation, scaling) into lightweight spectral losses. Applied to Open-Sora, MVDIT, and Hunyuan, our approach improves both motion accuracy and action recognition by ~11% on average on OpenVID-1M (relative), while maintaining visual quality. User studies show 74–83% preference for our physics-enhanced videos. It also reduces warping error by 22–37% (depending on the backbone) and improves temporal consistency scores. These results indicate that simple, global spectral cues are an effective drop-in regularizer for physically plausible motion in video diffusion.