SubIQ: Inverse Soft-Q Learning for Offline Imitation with Suboptimal Demonstrations.

Huy Hoang Tien Mai Pradeep Varakantham

Published in: CoRR (2024)

Keyphrases

reinforcement learning
function approximation
cooperative
multi agent
computationally efficient
state space
real time
model free
optimal policy
reinforcement learning algorithms
learning algorithm
learning rate
multi agent reinforcement learning
machine learning
imitation learning
computational models
temporal difference learning
dynamic environments
cost function
continuous state and action spaces