An Empirical Study of Reward Structures for Actor-Critic Reinforcement Learning in Air Combat Manoeuvring Simulation.
Budi KurniawanPeter VamplewMichael PapasimeonRichard DazeleyCameron FoalePublished in: Australasian Conference on Artificial Intelligence (2019)
Keyphrases
- reinforcement learning
- actor critic
- policy gradient
- air combat
- reinforcement learning algorithms
- temporal difference
- average reward
- function approximation
- approximate dynamic programming
- optimal control
- state space
- optimal policy
- reinforcement learning methods
- markov decision processes
- neuro fuzzy
- learning algorithm
- policy iteration
- machine learning
- model free
- transfer learning
- function approximators
- multi agent
- neural network
- partially observable markov decision processes
- long run
- control policy
- evaluation function
- stochastic games
- gradient method
- temporal difference learning
- mathematical model
- supervised learning
- dynamic programming
- natural actor critic
- policy gradient methods