File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)
Related Researcher

김지수

Kim, Gi-Soo
Statistical Decision Making
Read More

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Full metadata record

DC Field Value Language
dc.citation.conferencePlace US -
dc.citation.conferencePlace Pittsburgh -
dc.citation.endPage 1036 -
dc.citation.startPage 1027 -
dc.citation.title Conference on Uncertainty in Artificial Intelligence -
dc.contributor.author Kang, Minhyun -
dc.contributor.author Kim, Gi-Soo -
dc.date.accessioned 2024-01-09T16:05:10Z -
dc.date.available 2024-01-09T16:05:10Z -
dc.date.created 2023-10-31 -
dc.date.issued 2023-07-31 -
dc.description.abstract Linear bandit algorithms have been extensively studied and have shown successful in sequential decision tasks despite their simplicity. Many algorithms however work under the assumption that the reward is the sum of linear function of observed contexts and a sub-Gaussian error. In practical applications, errors can be heavy-tailed, especially in financial data. In such reward environments, algorithms designed for sub-Gaussian error may underexplore, resulting in suboptimal regret. In this paper, we relax the reward assumption and propose a novel linear bandit algorithm which works well under heavy-tailed errors as well. The proposed algorithm utilizes Huber regression. When contexts are stochastic with positive definite covariance matrix and the (1 + δ)-th moment of the error is bounded by a constant, we show that the high-probability upper bound of the regret is O(√dT 1+ 1 δ (log dT) 1+ δ δ ), where d is the dimension of context variables, T is the time horizon, and δ ∈ (0, 1]. This bound improves on the state-of-the-art regret bound of the Median of Means and Truncation algorithm by a factor of √log T and √d for the case where the time horizon T is unknown. We also remark that when δ = 1, the order is the same as the regret bound of linear bandit algorithms designed for sub-Gaussian errors. We support our theoretical findings with synthetic experiments. -
dc.identifier.bibliographicCitation Conference on Uncertainty in Artificial Intelligence, pp.1027 - 1036 -
dc.identifier.issn 2640-3498 -
dc.identifier.scopusid 2-s2.0-85170056939 -
dc.identifier.uri https://scholarworks.unist.ac.kr/handle/201301/67923 -
dc.identifier.url https://proceedings.mlr.press/v216/kang23a.html -
dc.language 영어 -
dc.publisher ML Research Press -
dc.title Heavy-tailed Linear Bandit with Huber Regression -
dc.type Conference Paper -
dc.date.conferenceDate 2023-07-31 -

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.