Document Type

Article

Abstract

The optimal reward function associated with the so-called "multiarmed bandit problem" for general Markov-Feller processes is considered. It is shown that this optimal reward function has a simple expression (product form) in terms of individual stopping problems, without any smoothness properties of the optimal reward function neither for the global problem nor for the individual stopping problems. Some results relative to a related problem with switching cost are obtained.

Disciplines

Numerical Analysis and Computation | Probability

Comments

Copyright © 1990 Society for Industrial and Applied Mathematics.

Share

COinS