Semi-parametric contextual bandits with graph-Laplacian regularization

Scholarworks@UNIST

UNIST Library

There are no files associated with this item.

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Related Researcher

김지수

Read More

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Semi-parametric contextual bandits with graph-Laplacian regularization

Author(s): Choi, Young-Geun, Kim, Gi-Soo, Paik, Seunghoon, Paik, Myunghee Cho

Abstract: Non-stationarity is ubiquitous in human behavior and addressing it in the contextual bandits is challenging. Several works have addressed the problem by investigating semi-parametric contextual bandits and warned that ignoring non-stationarity could harm performances. Another prevalent human behavior is social interaction which has become available in a form of a social network or graph structure. As a result, graph-based contextual bandits have received much attention. In this paper, we propose SemiGraphTS, a novel contextual Thompson-sampling algorithm for a graph-based semi-parametric reward model. Our algorithm is the first to be proposed in this setting. We derive an upper bound of the cumulative regret that can be expressed as a multiple of a factor depending on the graph structure and the order for the semi-parametric model without a graph. We evaluate the proposed and existing algorithms via simulation and real data example.

Keyword (Author): Contextual multi-armed bandit, Graph Laplacian, Semi-parametric reward model, Thompson sampling

qrcode

Tel : 052-217-1404 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.