Introduction of Fixed Mode States into Online Reinforcement Learning with Penalties and Rewards and its Application to Biped Robot Waist Trajectory Generation.
Seiya KurodaKazuteru MiyazakiHiroaki KobayashiPublished in: J. Adv. Comput. Intell. Intell. Informatics (2012)
Keyphrases
- reinforcement learning
- biped robot
- perceptual aliasing
- markov decision processes
- function approximation
- state space
- reinforcement learning algorithms
- markov decision problems
- optimal policy
- model free
- real time
- machine learning
- action space
- dynamic programming
- transition model
- reward function
- state variables
- partially observable
- fully observable
- artificial intelligence