Optimality of greedy policy for a class of standard reward function of restless multi-armed bandit problem.

Kehao Wang Quan Liu Lin Chen

Published in: IET Signal Process. (2012)

Keyphrases

reward function
inverse reinforcement learning
markov decision processes
optimal policy
reinforcement learning
average reward
policy search
state space
reinforcement learning algorithms
partially observable
markov decision process
multiple agents
control policies
total reward
dynamic programming
markov decision problems
state action
optimal control
transition probabilities
average cost
initially unknown
policy iteration
utility function
preference elicitation
data mining
pairwise
hierarchical reinforcement learning
action selection