Stochastic Contextual Bandits with Known Reward Functions.
Pranav SakulkarBhaskar KrishnamachariPublished in: CoRR (2016)
Keyphrases
- reward function
- stochastic systems
- reinforcement learning
- contextual information
- markov decision processes
- control policies
- state space
- optimal policy
- inverse reinforcement learning
- multi armed bandit
- policy search
- multiple agents
- state variables
- transition probabilities
- simple examples
- search engine
- markov decision process
- stochastic models
- decision problems
- monte carlo
- markov chain
- active learning
- image segmentation