Neural Rewards Regression for near-optimal policy identification in Markovian and partial observable environments.
Daniel SchneegaßSteffen UdluftThomas MartinetzPublished in: ESANN (2007)
Keyphrases
- optimal policy
- markov decision processes
- reinforcement learning
- reward function
- state space
- decision problems
- finite horizon
- infinite horizon
- state dependent
- finite state
- multistage
- dynamic programming
- expected reward
- network architecture
- total reward
- long run
- average reward
- policy iteration
- average cost
- markov decision process
- initial state
- bayesian reinforcement learning
- sufficient conditions
- lost sales
- control policies
- markov decision problems
- reproducing kernel hilbert space
- reinforcement learning algorithms
- optimal pricing
- serial inventory systems