Document Type
Article
Abstract
The optimal reward function associated with the so-called "multiarmed bandit problem" for general Markov-Feller processes is considered. It is shown that this optimal reward function has a simple expression (product form) in terms of individual stopping problems, without any smoothness properties of the optimal reward function neither for the global problem nor for the individual stopping problems. Some results relative to a related problem with switching cost are obtained.
Disciplines
Numerical Analysis and Computation | Probability
Recommended Citation
J.-L. Menaldi and M. Robin, On the optimal reward function of the continuous time multiarmed bandit problem, SIAM J. Control Optim., 28 (1990), pp. 97-112. doi: 10.1137/0328005
Comments
Copyright © 1990 Society for Industrial and Applied Mathematics.