File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)
Related Researcher

김지수

Kim, Gi-Soo
Statistical Decision Making
Read More

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Doubly Robust Thompson Sampling with Linear Payoffs

Author(s)
Kim, WonyoungKim, Gi-SooPaik, Myunghee Cho
Issued Date
2021-12
URI
https://scholarworks.unist.ac.kr/handle/201301/67924
Citation
Neural Information Processing Systems, pp.15830 - 15840
Abstract
A challenging aspect of the bandit problem is that a stochastic reward is observed only for the chosen arm and the rewards of other arms remain missing. The dependence of the arm choice on the past context and reward pairs compounds the complexity of regret analysis. We propose a novel multi-armed contextual bandit algorithm called Doubly Robust (DR) Thompson Sampling employing the doubly-robust estimator used in missing data literature to Thompson Sampling with contexts (LinTS). Different from previous works relying on missing data techniques (Dimakopoulou et al. [2019], Kim and Paik [2019]), the proposed algorithm is designed to allow a novel additive regret decomposition leading to an improved regret bound with the order of O(φ−2 √ T), where φ2 is the minimum eigenvalue of the covariance matrix of contexts. This is the first regret bound of LinTS using φ2 without the dimension of the context, d. Applying the relationship between φ2 and d, the regret bound of the proposed algorithm is O(d √ T) in many practical scenarios, improving the bound of LinTS by a factor of √ d. A benefit of the proposed method is that it utilizes all the context data, chosen or not chosen, thus allowing to circumvent the technical definition of unsaturated arms used in theoretical analysis of LinTS. Empirical studies show the advantage of the proposed algorithm over LinTS.
Publisher
Neural information processing systems foundation
ISSN
1049-5258

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.