Paris 2.0: A Decentralized Diffusion Model for Video Generation

Author:
Published: 5/27/2026, 6:20:41 PM
Category: Research

Enables temporally coherent video generation with decentralized training, doubling FVD improvement and boosting CLIP similarity and aesthetics under equal compute.

Paper

https://arxiv.org/abs/2605.26064

We present Paris 2.0, the first video generation model pre-trained through decentralized computation. Its training recipe builds upon Paris 1.0 [jiang2025paris], the first ever open-weight Decentralized Diffusion Model (DDM), which showed that image generation can be trained without a monolithic GPU cluster. However, temporally coherent video generation had remained an open problem under decentralized training, and Paris 2.0 closes it.In low-resolution text-to-video training, against a monolithic model trained on the same data under a matched total compute budget, Paris 2.0 cuts Fréchet Video Distance (FVD) from 561.04561.04 to 279.01279.01, a ∼2.0× 2.0 improvement, and lifts CLIP text-video similarity and aesthetic score.

Paris 2.0: A Decentralized Diffusion Model for Video Generation

Comments

Log in to leave a comment