Hedging using reinforcement learning: Contextual k-Armed Bandit versus Q-learning.
Loris CannelliGiuseppe NutiMarzio SalaOleg SzehrPublished in: CoRR (2020)
Keyphrases
- reinforcement learning
- function approximation
- reinforcement learning algorithms
- state space
- model free
- contextual information
- optimal control
- state action space
- temporal difference
- reinforcement learning methods
- control problems
- multi agent reinforcement learning
- learning algorithm
- multi agent
- action selection
- optimal policy
- eligibility traces
- markov decision processes
- temporal difference learning
- continuous state and action spaces
- financial markets
- learning agent
- transaction costs
- stochastic approximation
- transfer learning
- dynamic programming
- function approximators
- hierarchical reinforcement learning
- policy search
- machine learning
- continuous state
- partially observable
- learning process
- learning agents
- action space
- reward function
- exploration strategy
- cooperative
- relational reinforcement learning
- risk management
- learning problems