Offline Reinforcement Learning for Optimizing Production Bidding Policies.
Dmytro KorenkevychFrank ChengArtsiom BalakirAlex NikulkovLingnan GaoZhihao CenZuobing XuZheqing ZhuPublished in: CoRR (2023)
Keyphrases
- reinforcement learning
- optimal policy
- policy search
- markov decision process
- optimal production
- reward function
- control policies
- state space
- markov decision processes
- function approximation
- inventory level
- production process
- hierarchical reinforcement learning
- policy gradient methods
- markov decision problems
- dynamic programming
- reinforcement learning algorithms
- partially observable markov decision processes
- learning algorithm
- fitted q iteration
- online auctions
- production system
- model free
- reinforcement learning agents
- machine learning
- policy iteration
- decision problems
- control policy
- electronic marketplaces
- temporal difference
- production planning
- continuous state
- real time
- exploration exploitation tradeoff
- raw material
- infinite horizon
- combinatorial auctions
- multiagent reinforcement learning
- learning process
- multi agent