Aligning Agent Policy with Externalities: Reward Design via Bilevel RL.
Souradip ChakrabortyAmrit Singh BediAlec KoppelDinesh ManochaHuazheng WangFurong HuangMengdi WangPublished in: CoRR (2023)
Keyphrases
- reinforcement learning
- multi agent
- reward function
- action selection
- markov decision processes
- multi agent systems
- expected reward
- optimal policy
- multiagent systems
- decision making
- policy gradient
- learning agent
- agent oriented
- intelligent agents
- agent learns
- state action
- autonomous agents
- total reward
- multiple agents
- mobile agents
- design process
- user interface
- case study