Which Experiences Are Influential for Your Agent? Policy Iteration with Turn-over Dropout.
Takuya HiraokaTakashi OnishiYoshimasa TsuruokaPublished in: CoRR (2023)
Keyphrases
- policy iteration
- markov decision processes
- markov decision process
- discounted reward
- model free
- reinforcement learning
- least squares
- sample path
- optimal policy
- fixed point
- multi agent
- multi agent systems
- temporal difference
- average reward
- finite state
- decision making
- multiagent systems
- policy evaluation
- state space
- multiple agents
- infinite horizon
- partially observable
- optimal control
- linear programming
- action selection
- long run
- reinforcement learning algorithms
- learning agent
- graphical models
- reward function
- decision theoretic
- convergence rate
- decision problems
- sufficient conditions