Stochastic Contextual Bandits with Long Horizon Rewards.
Yuzhen QinYingcong LiFabio PasqualettiMaryam FazelSamet OymakPublished in: CoRR (2023)
Keyphrases
- multi armed bandits
- stochastic systems
- reinforcement learning
- multi armed bandit
- bandit problems
- contextual information
- markov decision processes
- stochastic models
- stochastic nature
- learning automata
- stochastic programming
- monte carlo
- objective function
- context sensitive
- long term and short term
- credit assignment
- stochastic model
- contextual knowledge
- stochastic optimization
- information retrieval
- context dependent
- decision problems
- bayesian networks
- website
- e learning