Evading malware classifiers using RL agent with action-mask.
Saurabh PandeyNitesh KumarAnand HandaSandeep Kumar ShuklaPublished in: Int. J. Inf. Sec. (2023)
Keyphrases
- action selection
- reinforcement learning
- action space
- state action
- multi agent
- agent learns
- action selection mechanism
- multi agent systems
- partial observations
- partially observable domains
- temporal difference
- discounted reward
- machine learning algorithms
- training data
- autonomous learning
- practical reasoning
- state space
- intelligent agents
- autonomous agents
- feature selection
- learning classifier systems
- decision trees
- markov decision processes
- multiagent systems
- internal state
- multiagent reinforcement learning
- reward signal
- multiple agents
- joint action
- learning agent
- support vector
- training set
- markov decision process
- learning agents
- malware detection
- naive bayes
- exploration strategy
- communicative acts
- reward shaping
- training samples
- sensing actions
- single agent
- agent architecture
- machine learning
- partially observable
- plan execution
- software agents
- evaluation function
- optimal policy
- reverse engineering
- dynamic environments
- agent model
- learning algorithm