Reliable and Interpretable Evaluation in Deep Representational Models

Kim, Pum Jun

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Yoo, Jaejun	-
dc.contributor.author	Kim, Pum Jun	-
dc.date.accessioned	2026-03-26T22:15:06Z	-
dc.date.available	2026-03-26T22:15:06Z	-
dc.date.issued	2026-02	-
dc.description.abstract	Recent advances in deep generative models in computer vision have extended their capabilities from image generation to diverse domains such as video and 3D object generations. What has driven these advancements at their core is the development of evaluation metrics that are reliable and accurate. These metrics assess generative models from a human perceptual perspective, measuring how closely the gen- erated data resembles real-world data and effectively highlighting their differences. This thesis inves- tigates recent advances in evaluation metrics by examining the key contributions of Article 1, Article 2, and Article 3. In addition, it identifies open challenges in evaluation that remain critical for the development of more powerful and reliable deep generative models. Article 1 introduces a novel evaluation metric for image generative models that measures the level of realism along two key aspects: fidelity and diversity. Existing metrics typically estimate the distributions of real and generated data in model embedding spaces that reflect human perception, and compute scores by comparing these distributions. However, generative models that are not properly trained often produce noisy data, and in the presence of such noise, existing metrics are unable to provide reliable and accurate evaluations. To address this issue, this work proposes a robust evaluation approach by estimating statistically and topologically significant supports for both real and generated data. This distribution estimation method is sensitive to subtle variations in the data distribution and provides more accurate and reliable evaluation results, even in the presence of noise. Article 2 introduces a novel evaluation metric for video generative models that measures realism along three aspects: fidelity, diversity, and temporal naturalness. Existing video metrics have largely relied on techniques developed for image generation models, which often fail to capture the temporal characteristics inherent in video data, resulting in incomplete or unreliable evaluations. To address this limitation, this work leverages the observation that frame-wise changes in typical videos exhibit amplitude distributions following a power law in the Fourier domain. By estimating this power law distribution, the proposed metric quantitatively measures the deviation of generated videos from the natural distribution, providing the first principled evaluation of temporal consistency in video generation. Article 3 proposes a benchmark that enables comparison between object recognition models and humans, and allows model analysis from a human visual perspective. The existing benchmark, using stylized images that blend shape and texture within a single image, suggests that humans primarily rely on shape, whereas models focus on texture. However, this prior work suffers from several limitations: (1) it does not utilize data representing pure shape and pure texture, (2) it does not consider images in which shape and texture are present in equal proportion (50:50), and (3) it employs evaluation measures that are not well-suited for model analysis and comparison. To address these limitations, Article 3 generates disentangled datasets that contain pure shape and texture cues and proposes a new metric that enables reliable and precise evaluation of models. This benchmark provides a clear and unbiased assessment of current object recognition models, enabling accurate measurement of how closely their reliance on shape and texture aligns with human perception.	-
dc.description.degree	Doctor	-
dc.description	Graduate School of Artificial Intelligence Artificial Intelligence	-
dc.identifier.uri	https://scholarworks.unist.ac.kr/handle/201301/91046	-
dc.identifier.uri	http://unist.dcollection.net/common/orgView/200000966325	-
dc.language	ENG	-
dc.publisher	Ulsan National Institute of Science and Technology	-
dc.rights.embargoReleaseDate	9999-12-31	-
dc.rights.embargoReleaseTerms	9999-12-31	-
dc.subject	polyimide-glycol hybrid gel\|semi-interpenetrating network\|photo-thermal imidization\|shape-deformable substrate\|thermal-mechanical modulation	-
dc.title	Reliable and Interpretable Evaluation in Deep Representational Models	-
dc.type	Thesis	-

Show Simple Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.