Login / Signup
Policy Invariance under Reward Transformations for General-Sum Stochastic Games.
Xiaosong Lu
Howard M. Schwartz
Sidney Nascimento Givigi
Published in:
CoRR (2014)
Keyphrases
</>
partially observable environments
reward function
image transformations
inverse reinforcement learning
reinforcement learning
average reward
policy gradient
expected reward
optimal policy
discounted reward
long run
control policy
agent receives
total reward
policy making
markov decision processes
bandit problems
data sets
partially observable
invariant features
multiscale