Heavy-tailed Linear Bandit with Huber Regression

Scholarworks@UNIST

UNIST Library

There are no files associated with this item.

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Related Researcher

김지수

Read More

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Heavy-tailed Linear Bandit with Huber Regression

Citation: Conference on Uncertainty in Artificial Intelligence, pp.1027 - 1036

Abstract: Linear bandit algorithms have been extensively studied and have shown successful in sequential decision tasks despite their simplicity. Many algorithms however work under the assumption that the reward is the sum of linear function of observed contexts and a sub-Gaussian error. In practical applications, errors can be heavy-tailed, especially in financial data. In such reward environments, algorithms designed for sub-Gaussian error may underexplore, resulting in suboptimal regret. In this paper, we relax the reward assumption and propose a novel linear bandit algorithm which works well under heavy-tailed errors as well. The proposed algorithm utilizes Huber regression. When contexts are stochastic with positive definite covariance matrix and the (1 + δ)-th moment of the error is bounded by a constant, we show that the high-probability upper bound of the regret is O(√dT 1+ 1 δ (log dT) 1+ δ δ ), where d is the dimension of context variables, T is the time horizon, and δ ∈ (0, 1]. This bound improves on the state-of-the-art regret bound of the Median of Means and Truncation algorithm by a factor of √log T and √d for the case where the time horizon T is unknown. We also remark that when δ = 1, the order is the same as the regret bound of linear bandit algorithms designed for sub-Gaussian errors. We support our theoretical findings with synthetic experiments.

qrcode

Tel : 052-217-1404 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.