Planning with Q-Values in Sparse Reward Reinforcement Learning.
Hejun LeiPaul WengJuan RojasYisheng GuanPublished in: ICIRA (1) (2022)
Keyphrases
- reinforcement learning
- action selection
- partially observable
- state space
- machine learning
- reward function
- markov decision processes
- macro actions
- temporal difference
- function approximation
- partially observable environments
- partial observability
- reward shaping
- high dimensional
- heuristic search
- complex domains
- learning algorithm
- deterministic domains
- eligibility traces
- sparse data
- dynamic programming
- attribute values
- optimal policy
- ai planning
- model free
- stochastic domains
- reinforcement learning algorithms
- mixed initiative
- blocks world
- single agent
- reinforcement learning problems
- supervised learning
- multi agent
- learning process
- planning domains
- total reward
- planning problems