Reward-free offline reinforcement learning: Optimizing behavior policy via action exploration.
Zhenbo HuangShiliang SunJing ZhaoPublished in: Knowl. Based Syst. (2024)
Keyphrases
- reinforcement learning
- action selection
- exploration exploitation tradeoff
- reward signal
- agent learns
- state action
- optimal policy
- action space
- total reward
- reward function
- temporal difference
- policy search
- function approximation
- partially observable environments
- partially observable domains
- state space
- discounted reward
- reinforcement learning algorithms
- exploration strategy
- average reward
- eligibility traces
- agent receives
- reward shaping
- inverse reinforcement learning
- selective perception
- policy gradient
- markov decision problems
- control policies
- model free
- dynamic programming
- partially observable
- state and action spaces
- markov decision processes
- real time
- learning algorithm
- transition model
- expected reward
- multi agent
- rl algorithms
- reinforcement learning problems
- policy iteration
- learning agent
- policy evaluation
- continuous state
- markov decision process
- initial state
- real robot
- state transitions
- actor critic
- human users
- machine learning