Hardware as Policy: Mechanical and Computational Co-Optimization using Deep Reinforcement Learning.
Tianjian ChenZhanpeng HeMatei T. CiocarliePublished in: CoRL (2020)
Keyphrases
- reinforcement learning
- optimal policy
- policy search
- mathematical programming
- optimization algorithm
- partially observable
- computational power
- low cost
- action selection
- real time
- markov decision process
- reinforcement learning algorithms
- hardware and software
- partially observable environments
- optimization method
- policy gradient
- optimization process
- image processing
- state space
- embedded systems
- optimization problems
- combinatorial optimization
- reinforcement learning problems
- reward function
- function approximation
- global optimization
- hardware implementation
- linear programming
- control policy
- average reward
- temporal difference
- policy evaluation
- infinite horizon
- massively parallel