Average Reward Optimization Objective In Partially Observable Domains.
Yuri GrinbergDoina PrecupPublished in: ICML (1) (2013)
Keyphrases
- average reward
- partially observable domains
- markov decision processes
- partially observable markov decision processes
- reinforcement learning
- long run
- optimal policy
- partially observable
- dynamic programming
- policy iteration
- machine learning
- linear programming
- incomplete information
- model free
- action models
- inverse reinforcement learning