| dc.description.abstract |
Interests have grown in discovering the causality in marketing and medical domains. For instance, online retail companies aim to expose the ad that has greatest effectiveness of exposure, while physicians practice precision medicine by applying a medical treatment only to patients who have genetic sequence that have the largest improvement in health. Both effectiveness and improvement imply causality, which can be measured by comparing outcomes under intervention versus baseline. In this work, we propose a bandit algorithm which achieves the aforementioned goals in an online manner from repeated sequences of choosing an arm (ad or genetic sequence) and observing outcomes under intervention or baseline. The reward, i.e., the causal effect, is not fully observed, which presents a new challenge compared to conventional bandit settings. We aim to do best arm identification and regret minimization simultaneously. We consider two types of regrets and propose two different algorithms, each utilizing a distinct intervention/baseline allocation policy to minimize one of the two regrets.We show that with high probability, each algorithm identifies the best arm after a number of iterations that closely matches known lower bound. We also show that the high-probability upper bound of the regret closely matches known lower bound. |
- |