Minimax Optimal Bandits for Heavy Tail Rewards

Lee, Kyungjae; Lim, Sungbin

doi:10.1109/TNNLS.2022.3203035

Scholarworks@UNIST

UNIST Library

File Download

Minimax_Optimal_Bandits_for_Heavy_Tail_Rewards.pdf.pdf

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Full metadata record

DC Field	Value	Language
dc.citation.title	IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS	-
dc.contributor.author	Lee, Kyungjae	-
dc.contributor.author	Lim, Sungbin	-
dc.date.accessioned	2023-12-21T13:40:55Z	-
dc.date.available	2023-12-21T13:40:55Z	-
dc.date.created	2022-10-04	-
dc.date.issued	2022-09	-
dc.description.abstract	Stochastic multiarmed bandits (stochastic MABs) are a problem of sequential decision-making with noisy rewards, where an agent sequentially chooses actions under unknown reward distributions to minimize cumulative regret. The majority of prior works on stochastic MABs assume that the reward distribution of each action has bounded supports or follows light-tailed distribution, i.e., sub-Gaussian distribution. However, in a variety of decision-making problems, the reward distributions follow a heavy-tailed distribution. In this regard, we consider stochastic MABs with heavy-tailed rewards, whose pth moment is bounded by a constant v(p) for 1 < p <= 2. First, we provide theoretical analysis on sub-optimality of the existing exploration methods for heavy-tailed rewards where it has been proven that existing exploration methods do not guarantee a minimax optimal regret bound. Second, to achieve the minimax optimality under heavy-tailed rewards, we propose a minimax optimal robust upper confidence hound (MR-UCB) by providing tight confidence bound of a p-robust estimator. Furthermore, we also propose a minimax optimal robust adaptively perturbed exploration (MR-APE) which is a randomized version of MR-UCB. In particular, unlike the existing robust exploration methods, both proposed methods have no dependence on v(p). Third, we provide the gap-dependent and independent regret bounds of proposed methods and prove that both methods guarantee the minimax optimal regret bound for a heavy-tailed stochastic MAB problem. The proposed methods are the first algorithm that theoretically guarantees the minimax optimality under heavy-tailed reward settings to the best of our knowledge. Finally, we demonstrate the superiority of the proposed methods in simulation with Pareto and Frichet noises with respect to regrets.	-
dc.identifier.bibliographicCitation	IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS	-
dc.identifier.doi	10.1109/TNNLS.2022.3203035	-
dc.identifier.issn	2162-237X	-
dc.identifier.uri	https://scholarworks.unist.ac.kr/handle/201301/59584	-
dc.identifier.wosid	000854544700001	-
dc.language	영어	-
dc.publisher	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC	-
dc.title	Minimax Optimal Bandits for Heavy Tail Rewards	-
dc.type	Article	-
dc.description.isOpenAccess	TRUE	-
dc.relation.journalWebOfScienceCategory	Computer Science, Artificial Intelligence; Computer Science, Hardware & Architecture; Computer Science, Theory & Methods; Engineering, Electrical & Electronic	-
dc.relation.journalResearchArea	Computer Science; Engineering	-
dc.type.docType	Article; Early Access	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.subject.keywordAuthor	Heavy-tailed noise	-
dc.subject.keywordAuthor	mini-max optimality	-
dc.subject.keywordAuthor	multi-armed bandits (MABs)	-
dc.subject.keywordAuthor	regret analysis	-

Show Simple Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1404 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.