One-shot Policy Elicitation via Semantic Reward Manipulation.
Aaquib TabrezRyan LeonardBradley HayesPublished in: CoRR (2021)
Keyphrases
- inverse reinforcement learning
- preference elicitation
- reinforcement learning
- partially observable environments
- optimal policy
- reward function
- semantic information
- natural language
- semantic similarity
- action selection
- semantic knowledge
- semantic web
- average reward
- high level
- expected reward
- minimax regret
- control policy
- semantic description
- policy gradient
- semantic representation
- long run
- bandit problems
- semantic annotation
- total reward
- domain ontology