Login / Signup
Stealthy Imitation: Reward-guided Environment-free Policy Stealing.
Zhixiong Zhuang
Maria-Irina Nicolae
Mario Fritz
Published in:
CoRR (2024)
Keyphrases
</>
reinforcement learning
real time
partially observable environments
inverse reinforcement learning
reward function
decision processes
agent learns
optimal policy
long run
agent receives
decision making
mobile robot
function approximation
complex environments
markov decision process
expected reward