EXPODE: EXploiting POlicy Discrepancy for Efficient Exploration in Multi-agent Reinforcement Learning.
Yucong ZhangChao YuPublished in: AAMAS (2023)
Keyphrases
- multi agent reinforcement learning
- multi agent
- optimal policy
- reinforcement learning
- learning agents
- multi agent learning
- multi agent systems
- stochastic games
- action selection
- markov chain
- artificial intelligence
- distributed control
- expert systems
- sufficient conditions
- information processing
- policy iteration
- function approximators