Adaptive Opponent Policy Detection in Multi-Agent MDPs: Real-Time Strategy Switch Identification Using Running Error Estimation.
Mohidul Haque MridulMohammad Foysal KhanRedwan Ahmed RizveeMd. Mosaddek KhanPublished in: CoRR (2024)
Keyphrases
- error estimation
- real time
- multi agent
- optimal policy
- reinforcement learning
- markov decision processes
- high speed
- adaptive strategies
- model selection
- error estimates
- markov decision problems
- state space
- markov decision process
- decision problems
- partially observable
- dynamic programming
- policy search
- finite horizon
- reward function
- optimal strategy
- policy iteration
- control policy
- multi agent systems
- support vector
- multiple agents
- partially observable markov decision processes
- infinite horizon
- action space
- average reward
- planning under uncertainty
- generalization error
- active learning
- multiagent systems