The optimal reward function associated with the so-called "multiarmed bandit problem" for general Markov-Feller processes is considered. It is shown that this optimal reward function has a simple expression (product form) in terms of individual stopping problems, without any smoothness properties of the optimal reward function neither for the global problem nor for the individual stopping problems. Some results relative to a related problem with switching cost are obtained.
Numerical Analysis and Computation | Probability
J.-L. Menaldi and M. Robin, On the optimal reward function of the continuous time multiarmed bandit problem, SIAM J. Control Optim., 28 (1990), pp. 97-112. doi: 10.1137/0328005