VideoGigaGAN: Towards Detail-rich Video Super-Resolution
The paper introduces VideoGigaGAN, a generative VSR model aiming to produce high-resolution videos with both fine details and temporal consistency.
VideoGigaGAN represents a notable advancement in video super-resolution with the introduction of, a generative model focused on producing detailed videos with temporal consistency, drawing inspiration from successful image upsampling techniques.
Based on the GigaGAN image upsampler architecture, VideoGigaGAN extends its capabilities to the video domain by incorporating adaptive kernel selection and self-attention layers. However, initial attempts encountered challenges, resulting in temporal flickering and artifacts. To address these issues, the researchers introduced several techniques.
Flow-guided feature propagation was introduced to align features across frames, enhancing temporal consistency. Anti-aliasing blocks were incorporated to mitigate flickering caused by downsampling operations. Moreover, a high-frequency shuttle mechanism was devised to inject fine-grained details into the decoder blocks, compensating for detail loss during upsampling.
During implementation, components like temporal attention, flow-guided propagation, anti-aliasing blocks, and the high-frequency shuttle were progressively integrated to enhance performance. Training involved datasets like REDS and Vimeo-90K, utilizing multiple GPUs and fixed learning rates. Ablation studies confirmed the efficacy of each component, showcasing improvements in quality and temporal consistency metrics.
Comments
None