PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback.
Souradip ChakrabortyAmrit S. BediAlec KoppelHuazheng WangDinesh ManochaMengdi WangFurong HuangPublished in: ICLR (2024)
Keyphrases
- reinforcement learning
- optimal policy
- policy search
- markov decision process
- function approximation
- state space
- markov decision processes
- action selection
- reward signal
- multi agent
- human behavior
- continuous state
- markov decision problems
- function approximators
- actor critic
- human operators
- machine learning
- temporal difference
- human interaction
- reinforcement learning problems
- motor skills
- partially observable environments
- approximate dynamic programming
- partially observable domains
- reward function
- control policies
- action space
- image alignment
- infinite horizon
- user feedback
- relevance feedback
- decision making
- learning algorithm