MAVEN: A Multi-Agent Framework for Multicultural Text-to-Video Generation
Refines prompts with parallel or sequential expert agents to boost cultural fidelity in text-to-video generation across mono- and cross-cultural prompts.
Refines prompts with parallel or sequential expert agents to boost cultural fidelity in text-to-video generation across mono- and cross-cultural prompts.
Decouples fairness from guidance scale in diffusion models by equalizing guidance distributions or offsetting null embeddings
BiDPO jointly optimizes image and text preferences with region-guided alignment to boost compositional fidelity in T2I generation.
Boosts text-to-video efficiency by combining hybrid full-sparse attention with Skiparse-2D and Sparse Sequence Parallelism to cut communication and maintain quality.
Directs narrative-aware video generation by conditioning on multiple keyframes, enabling single-shot, multi-shot, and extension scenarios
Unifies layers in a 20B masked region diffusion model for multi-layer transparent image generation and editing
Enables temporally coherent video generation with decentralized training, doubling FVD improvement and boosting CLIP similarity and aesthetics under equal compute.