File Download

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)
Related Researcher

최진영

Choi, Jinyoung
Read More

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

FIFO-Diffusion: Generating Infinite Videos from Text without Training

Author(s)
Kim, JihwanKang, JunohChoi, JinyoungHan, Bohyung
Issued Date
2024-12-10
URI
https://scholarworks.unist.ac.kr/handle/201301/91122
Citation
Neural Information Processing Systems
Abstract
We propose a novel inference technique based on a pretrained diffusion model for text-conditional video generation. Our approach, called FIFO-Diffusion, is conceptually capable of generating infinitely long videos without additional training. This is achieved by iteratively performing diagonal denoising, which simultaneously processes a series of consecutive frames with increasing noise levels in a queue; our method dequeues a fully denoised frame at the head while enqueuing a new random noise frame at the tail. However, diagonal denoising is a double-edged sword as the frames near the tail can take advantage of cleaner frames by forward reference but such a strategy induces the discrepancy between training and inference. Hence, we introduce latent partitioning to reduce the training-inference gap and lookahead denoising to leverage the benefit of forward referencing. Practically, FIFO-Diffusion consumes a constant amount of memory regardless of the target video length given a baseline model, while well-suited for parallel inference on multiple GPUs. We have demonstrated the promising results and effectiveness of the proposed methods on existing text-to-video generation baselines. Generated video examples and source codes are available at our project page.
Publisher
Advances in Neural Information Processing Systems

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.