There are no files associated with this item.
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Lim, Chiehyeon | - |
dc.contributor.author | Yoon, Kihyuk | - |
dc.date.accessioned | 2024-10-14T13:50:13Z | - |
dc.date.available | 2024-10-14T13:50:13Z | - |
dc.date.issued | 2024-08 | - |
dc.description.abstract | In this work, I present theoretical analyses on both a limitation and a benefit of Layer Normalization (LN). Previous studies have observed the homogeneity limitation of LN on image datasets, noting that LN produces similar outputs across samples in CNNs or tokens in Transformers. However, there has been no theoretical analysis of this limitation, nor empirical analysis on Natural Language Processing (NLP) datasets, where LN has been a dominant normalization method. Furthermore, there is a lack of investigation into the benefits of LN. To fill this research gap between theoretical analysis and practical usage, I theoretically demonstrate that LN produces homogeneous output statistics in both CNNs and Transformers, regardless of network architecture. Additionally, I show that LN has a benefit on activation robustness that can address the limitations of existing activation functions. Based on these theoretical analyses, I introduce Flexible Layer Normalization (FLN) to address the limitation and Layer-level Activation (LayerAct) to extend the benefit of LN. FLN uses non-linear flexible mean and variance for normalization, leading to outputs with more diverse statistics compared to those of LN. Experiments comparing networks with LN and FLN demonstrate that networks with FLN converge faster and perform better than those with LN. Although LN has the potential to enhance activation robustness, it shows inferior performance in CNN-based networks compared to Batch Normalization (BN). To extend the usage of LN's benefit to CNNs, I introduce LayerAct, which utilizes layer-direction normalization in its activation mechanism without suffering from the homogeneity limitation. LayerAct provides additional benefits as an activation function, such as addressing the trade-off between two important properties of activation and smoothing the loss and gradient landscapes. Experimental analysis shows that networks with LayerAct functions not only perform better or similarly to others but are also more robust against out-of-distribution (OOD) corruptions compared to those with other activation functions. I applied FLN and LayerAct to two real-world applications: the sugar manufacturing process and medical image segmentation. Although deep learning methods have achieved outstanding performance in various tasks, they suffer when the distributions of datasets vary between training and inference periods. This is common in real-world applications, where many factors produce noise and differing distributions between samples. The manufacturing process collects time-series datasets using sensors, which are highly susceptible to corruption from various factors such as machine or feed replacement, sensor issues, or human error. Considering these challenges and the widespread use of Transformer-based architectures with LN for time-series data, replacing LN and the activation function with FLN and a LayerAct function, respectively, can enhance Transformer performance. Experiments on a real-world manufacturing process show that FLN and LayerAct can improve network performance, with networks using both FLN and LayerAct achieving the best performance in 4 out of 6 cases. Similarly, networks for medical image segmentation also suffer from various types of noise due to camera or lighting conditions. In such datasets with diverse distributions, LayerAct, which considers layer-level statistics during activation, has the potential to outperform other activation functions. Experimental results indicate that networks with LayerAct outperform those with other activation functions. |
- |
dc.description.degree | Doctor | - |
dc.description | Department of Industrial Engineering | - |
dc.identifier.uri | https://scholarworks.unist.ac.kr/handle/201301/84113 | - |
dc.identifier.uri | http://unist.dcollection.net/common/orgView/200000814063 | - |
dc.language | ENG | - |
dc.publisher | Ulsan National Institute of Science and Technology | - |
dc.title | On the Output Homogeneity Limitation and Activation Robustness Benefit of Layer-Direction Normalization (LN) in Neural Networks: Theoretical Analyses and Methods Development to Expand LN | - |
dc.type | Thesis | - |
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Tel : 052-217-1404 / Email : scholarworks@unist.ac.kr
Copyright (c) 2023 by UNIST LIBRARY. All rights reserved.
ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.