Reward Tuning for self-adaptive Policy in MDP based Distributed Decision-Making to ensure a Safe Mission Planning.
Mohand HamadoucheCatherine DezanKalinka R. L. J. C. BrancoPublished in: DSN Workshops (2020)
Keyphrases
- mission planning
- reward function
- optimal policy
- decision making
- average reward
- inverse reinforcement learning
- reinforcement learning
- markov decision processes
- markov decision process
- total reward
- expected reward
- discounted reward
- distributed systems
- long run
- partially observable environments
- partially observable
- state space
- policy iteration
- markov decision problems
- action selection
- infinite horizon
- dynamic programming
- distributed environment
- reinforcement learning algorithms
- state action
- multi agent
- state and action spaces
- model free
- finite state
- decision processes
- finite horizon
- transition probabilities
- partially observable markov decision processes
- policy making
- policy gradient
- decision process
- real time