| dc.description.abstract |
The bandit algorithm is a reinforcement learning approach where an agent sequentially selects one of several actions in a given environment, observes the reward from that choice, and optimizes its policy to maximize cumulative rewards (or equivalently, minimize regret). When choosing among several ac- tions, contextual information associated with each action can be valuable, particularly when the reward is correlated with the contextual features. However, when certain contextual features are missing, utiliz- ing such data to choose actions and learn the reward model becomes challenging, thereby limiting the algorithm’s effectiveness. This study proposes an enhanced bandit algorithm that imputes missing contextual features using statistical estimates under the Missing at Random (MAR) assumption. This approach reduces data loss and enables the use of more complete contextual information. To achieve this, we extend the GLOC (Generalized Linear Online-to-confidence-set Conversion) framework to effectively handle missing co- variates. By integrating imputation techniques, the proposed method complements missing contextual information for each action, improving the model’s performance while preserving the strengths of the GLOC framework. The proposed model, termed the Imputation-enhanced Generalized Linear Bandit (IGLB), efficiently utilizes available information to address the challenges posed by missing contexts. We evaluate the per- formance of IGLB through experiments on both synthetic datasets and the real-world Warfarin dataset, comparing its results with existing methods. Experimental results demonstrate that IGLB achieves im- proved predictive performance, highlighting the benefits of leveraging imputed context for more com- prehensive decision-making. |
- |