Policy Invariance under Reward Transformations for General-Sum Stochastic Games.
Xiaosong LuHoward M. SchwartzSidney Nascimento GivigiPublished in: CoRR (2014)
Keyphrases
- partially observable environments
- reward function
- image transformations
- inverse reinforcement learning
- reinforcement learning
- average reward
- policy gradient
- expected reward
- optimal policy
- discounted reward
- long run
- control policy
- agent receives
- total reward
- policy making
- markov decision processes
- bandit problems
- data sets
- partially observable
- invariant features
- multiscale