Mathematics Faculty Research Publications

On the Optimal Reward Function of the Continuous Time Multiarmed Bandit Problem

José Luis Menaldi, Wayne State UniversityFollow
Maurice Robin, Institut National de Recherche en Informatique et en Automatique, Rocquencourt

Document Type

Article

Abstract

The optimal reward function associated with the so-called "multiarmed bandit problem" for general Markov-Feller processes is considered. It is shown that this optimal reward function has a simple expression (product form) in terms of individual stopping problems, without any smoothness properties of the optimal reward function neither for the global problem nor for the individual stopping problems. Some results relative to a related problem with switching cost are obtained.

Disciplines

Numerical Analysis and Computation | Probability

Comments

Recommended Citation

J.-L. Menaldi and M. Robin, On the optimal reward function of the continuous time multiarmed bandit problem, SIAM J. Control Optim., 28 (1990), pp. 97-112. doi: 10.1137/0328005

Download

Find in your library

Included in

Numerical Analysis and Computation Commons, Probability Commons

COinS

DigitalCommons@WayneState

Mathematics Faculty Research Publications

On the Optimal Reward Function of the Continuous Time Multiarmed Bandit Problem

Document Type

Abstract

Disciplines

Comments

Recommended Citation

Included in

Links

Browse

Author Corner

DigitalCommons@WayneState

Mathematics Faculty Research Publications

On the Optimal Reward Function of the Continuous Time Multiarmed Bandit Problem

Authors

Document Type

Abstract

Disciplines

Comments

Recommended Citation

Included in

Share

Links

Browse

Author Corner