Dynamic Adjustment Policy of Search Driver Matching Distance via Markov Decision Process.
Suiming GuoPengcheng ZhangQianrong ShenPublished in: ICA3PP (1) (2021)
Keyphrases
- markov decision process
- optimal policy
- state space
- finite horizon
- markov decision processes
- reinforcement learning
- infinite horizon
- search algorithm
- search space
- transition matrices
- initial state
- policy iteration
- decision problems
- action space
- partial observability
- reward function
- average cost
- state action
- markov games
- dynamic programming
- transition probabilities
- state variables
- average reward
- stationary policies