File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Bridging the Capacity Gap in Diffusion Models via Easy-to-Hard Knowledge Distillation

Author(s)
Han, HyunSoo
Advisor
Yoo, Jae Jun
Issued Date
2026-02
URI
https://scholarworks.unist.ac.kr/handle/201301/91054 http://unist.dcollection.net/common/orgView/200000966252
Abstract
Recent advances in diffusion models have dramatically improved image synthesis quality, but at the cost of large model size and heavy computation, making direct deployment of state-of-the-art models on customer-grade GPUs increasingly impractical. This has motivated compression approaches such as structured pruning combined with knowledge distillation (KD), where a lightweight student is trained to mimic a large teacher within a pruning–KD framework. However, we empirically find that, for diffu- sion models, conventional KD objectives become unstable as the teacher–student capacity gap widens: under high compression ratios they fail to provide reliable guidance, leading to degraded or even col- lapsed students. To address this issue, we analyze the distillation error in diffusion models and observe that it naturally decomposes into simple low-order statistical discrepancies and complex fine residuals. Building on this, we propose a “Coarse-to-Fine” distillation framework with LInear FiTting-based dis- tillation (LIFT) and Piecewise Local Adaptive Coefficient Estimation (PLACE). LIFT parameterizes the KD objective with a global linear regression module, explicitly separating a coarse alignment of low-order moments (Coarse-Easy errors) from a residual refinement term that focuses on the remain- ing Fine-Hard structure, and employs an adaptive schedule that gradually shifts emphasis from coarse to fine components during training. PLACE extends LIFT to spatially non-uniform errors by ranking residual magnitudes, partitioning outputs into difficulty-based groups, and applying LIFT independently within each group, yielding locally adaptive guidance without introducing any additional parameters or inference-time overhead. Across pixel and latent space diffusion models, and for both U-Net and DiT backbones, our framework consistently improves over existing KD-based compression baselines under their original pruning–KD configuration. Notably, it achieves stable convergence and strong image qual- ity even under aggressive pruning (e.g., 90% channel reduction), where conventional KD objectives fail, thereby enabling practical lightweight diffusion models on memory-limited hardware.
Publisher
Ulsan National Institute of Science and Technology
Degree
Master
Major
Graduate School of Artificial Intelligence Artificial Intelligence

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.