VER: Scaling On-Policy RL Leads to the Emergence of Navigation in Embodied Rearrangement.

Erik Wijmans Irfan Essa Dhruv Batra

Published in: CoRR (2022)

Keyphrases

optimal policy
reinforcement learning
markov decision process
action selection
state space
markov decision processes
partially observable domains
control policy
multi agent
model free
control policies
policy evaluation
average reward
policy gradient
learning algorithm
policy search
average cost
function approximation
reinforcement learning problems
actor critic
embodied cognition
state and action spaces
markov decision problems
learning agents
state action
policy iteration
temporal difference
information space
long run
sufficient conditions