Profit sharing that can learn deterministic policy for POMDPs environments.
Yohei TakamoriYuko OsanaPublished in: SMC (2011)
Keyphrases
- profit sharing
- reinforcement learning
- uncertain environments
- policy making
- partially observable markov decision processes
- optimal policy
- function approximators
- partially observable
- policy search
- policy gradient
- markov decision problems
- joint replenishment
- markov decision processes
- game theory
- supply chain
- decision problems
- reward function
- markov decision process
- state space
- dynamic environments
- infinite horizon
- state dependent
- upper bound
- cooperative