One-shot Policy Elicitation via Semantic Reward Manipulation.

Aaquib Tabrez Ryan Leonard Bradley Hayes

Published in: CoRR (2021)

Keyphrases

inverse reinforcement learning
preference elicitation
reinforcement learning
partially observable environments
optimal policy
reward function
semantic information
natural language
semantic similarity
action selection
semantic knowledge
semantic web
average reward
high level
expected reward
minimax regret
control policy
semantic description
policy gradient
semantic representation
long run
bandit problems
semantic annotation
total reward
domain ontology