File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)
Related Researcher

임동영

Lim, Dong-Young
Read More

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Full metadata record

DC Field Value Language
dc.citation.startPage drad038 -
dc.citation.title IMA JOURNAL OF NUMERICAL ANALYSIS -
dc.contributor.author Lim, Dong-Young -
dc.contributor.author Neufeld, Ariel -
dc.contributor.author Sabanis, Sotirios -
dc.contributor.author Zhang, Ying -
dc.date.accessioned 2023-12-21T12:37:07Z -
dc.date.available 2023-12-21T12:37:07Z -
dc.date.created 2023-07-05 -
dc.date.issued 2023-06 -
dc.description.abstract We consider nonconvex stochastic optimization problems where the objective functions have super-linearly growing and discontinuous stochastic gradients. In such a setting, we provide a nonasymptotic analysis for the tamed unadjusted stochastic Langevin algorithm (TUSLA) introduced in Lovas et al. (2020). In particular, we establish nonasymptotic error bounds for the TUSLA algorithm in Wasserstein-1 and Wasserstein-2 distances. The latter result enables us to further derive nonasymptotic estimates for the expected excess risk. To illustrate the applicability of the main results, we consider an example from transfer learning with ReLU neural networks, which represents a key paradigm in machine learning. Numerical experiments are presented for the aforementioned example, which support our theoretical findings. Hence, in this setting, we demonstrate both theoretically and numerically that the TUSLA algorithm can solve the optimization problem involving neural networks with ReLU activation function. Besides, we provide simulation results for synthetic examples where popular algorithms, e.g., ADAM, AMSGrad, RMSProp and (vanilla) stochastic gradient descent algorithm, may fail to find the minimizer of the objective functions due to the super-linear growth and the discontinuity of the corresponding stochastic gradient, while the TUSLA algorithm converges rapidly to the optimal solution. Moreover, we provide an empirical comparison of the performance of TUSLA with popular stochastic optimizers on real-world datasets, as well as investigate the effect of the key hyperparameters of TUSLA on its performance. -
dc.identifier.bibliographicCitation IMA JOURNAL OF NUMERICAL ANALYSIS, pp. drad038 -
dc.identifier.doi 10.1093/imanum/drad038 -
dc.identifier.issn 0272-4979 -
dc.identifier.uri https://scholarworks.unist.ac.kr/handle/201301/64793 -
dc.identifier.url https://academic.oup.com/imajna/advance-article/doi/10.1093/imanum/drad038/7192418?login=true -
dc.identifier.wosid 001003995900001 -
dc.language 영어 -
dc.publisher OXFORD UNIV PRESS -
dc.title Non-asymptotic estimates for TUSLA algorithm for non-convex learning with applications to neural networks with ReLU activation function -
dc.type Article -
dc.description.isOpenAccess FALSE -
dc.relation.journalWebOfScienceCategory Mathematics, Applied -
dc.relation.journalResearchArea Mathematics -
dc.type.docType Article; Early Access -
dc.description.journalRegisteredClass scie -
dc.description.journalRegisteredClass scopus -
dc.subject.keywordAuthor ReLU activation function -
dc.subject.keywordAuthor taming technique -
dc.subject.keywordAuthor super-linearly growing coefficients -
dc.subject.keywordAuthor discontinuous stochastic gradient -
dc.subject.keywordAuthor nonconvex optimization -
dc.subject.keywordAuthor nonasymptotic estimates -
dc.subject.keywordAuthor artificial neural networks -
dc.subject.keywordPlus GRADIENT LANGEVIN DYNAMICS -
dc.subject.keywordPlus DEPENDENT DATA STREAMS -
dc.subject.keywordPlus CONVERGENCE -

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.