Login / Signup
Diverse Policies Converge in Reward-free Markov Decision Processe.
Fanqi Lin
Shiyu Huang
Weiwei Tu
Published in:
CoRR (2023)
Keyphrases
</>
markov decision
production system
reward function
expected reward
reinforcement learning
optimal policy
wide variety
long run
total reward
database
control policy
case study
real world
markov decision processes
search algorithm
finite horizon
revenue management
bandit problems
discounted reward
neural network