Optimality of greedy policy for a class of standard reward function of restless multi-armed bandit problem.
Kehao WangQuan LiuLin ChenPublished in: IET Signal Process. (2012)
Keyphrases
- reward function
- inverse reinforcement learning
- markov decision processes
- optimal policy
- reinforcement learning
- average reward
- policy search
- state space
- reinforcement learning algorithms
- partially observable
- markov decision process
- multiple agents
- control policies
- total reward
- dynamic programming
- markov decision problems
- state action
- optimal control
- transition probabilities
- average cost
- initially unknown
- policy iteration
- utility function
- preference elicitation
- data mining
- pairwise
- hierarchical reinforcement learning
- action selection