Imitation Learning in Discounted Linear MDPs without exploration assumptions.
Luca VianoStratis SkoulakisVolkan CevherPublished in: CoRR (2024)
Keyphrases
- markov decision processes
- imitation learning
- reinforcement learning
- optimal policy
- finite horizon
- average reward
- dynamic programming
- average cost
- state space
- policy iteration
- infinite horizon
- robotic systems
- markov decision process
- maximum margin
- reinforcement learning algorithms
- action selection
- long run
- humanoid robot
- action space
- function approximators
- function approximation
- active learning