An Approximately Optimal Relative Value Learning Algorithm for Averaged MDPs with Continuous States and Actions.
Hiteshi SharmaRahul JainPublished in: Allerton (2019)
Keyphrases
- approximately optimal
- reinforcement learning
- action space
- learning algorithm
- initial state
- perceptual aliasing
- markov decision processes
- state transitions
- state action
- markov decision problems
- state and action spaces
- state space
- reinforcement learning algorithms
- partially observable
- continuous action
- situation calculus
- continuous state and action spaces
- action selection
- action sequences
- reward function
- decision theoretic planning
- policy search
- optimal policy
- real time dynamic programming
- continuous state spaces
- markov decision process
- state transition
- learning agent
- mechanism design
- stochastic domains
- supervised learning
- decision processes
- learning rate
- machine learning
- continuous state
- decision theoretic
- cooperative
- action sets
- factored mdps
- model free
- transition probabilities
- dynamic programming
- approximation ratio
- function approximation
- temporal difference
- average cost
- policy iteration
- state variables