SubIQ: Inverse Soft-Q Learning for Offline Imitation with Suboptimal Demonstrations.
Huy HoangTien MaiPradeep VarakanthamPublished in: CoRR (2024)
Keyphrases
- reinforcement learning
- function approximation
- cooperative
- multi agent
- computationally efficient
- state space
- real time
- model free
- optimal policy
- reinforcement learning algorithms
- learning algorithm
- learning rate
- multi agent reinforcement learning
- machine learning
- imitation learning
- computational models
- temporal difference learning
- dynamic environments
- cost function
- continuous state and action spaces