On Optimality of Greedy Policy for a Class of Standard Reward Function of Restless Multi-armed Bandit Problem
Quan LiuKehao WangLin ChenPublished in: CoRR (2011)
Keyphrases
- reward function
- inverse reinforcement learning
- markov decision processes
- optimal policy
- reinforcement learning
- state space
- policy search
- multiple agents
- average reward
- reinforcement learning algorithms
- dynamic programming
- partially observable
- markov decision process
- transition probabilities
- optimal control
- markov decision problems
- total reward
- search algorithm
- average cost
- state action
- state variables
- policy iteration
- preference elicitation
- data mining
- transition model
- prior knowledge
- initially unknown
- hierarchical reinforcement learning
- infinite horizon