A Reinforcement Learning-Based Markov-Decision Process (MDP) Implementation for SRAM FPGAs.
Aiwu RuanAokai ShiLiang QinShiyang XuYifan ZhaoPublished in: IEEE Trans. Circuits Syst. II Express Briefs (2020)
Keyphrases
- markov decision process
- reinforcement learning
- markov decision processes
- state space
- optimal policy
- temporal difference learning
- infinite horizon
- finite horizon
- action space
- transition matrices
- transition probabilities
- policy iteration
- function approximation
- hardware implementation
- initial state
- probabilistic planning
- partial observability
- reinforcement learning algorithms
- state action
- markov games