Improving a sequence-to-sequence nlp model using a reinforcement learning policy algorithm.
Jabri IsmailAboulbichr AhmedAziza El-OuaaziziPublished in: CoRR (2022)
Keyphrases
- probabilistic model
- learning algorithm
- model free
- reinforcement learning
- input data
- long sequences
- cost function
- mathematical model
- theoretical analysis
- objective function
- dynamic programming
- recognition algorithm
- optimal solution
- estimation algorithm
- kalman filter
- detection algorithm
- np hard
- similarity measure
- classification algorithm
- viterbi algorithm
- expectation maximization
- tree structure
- monte carlo
- optimal policy
- em algorithm
- markov decision process
- hidden state
- machine learning
- markov model
- bayesian framework
- computational complexity
- simulated annealing
- probability distribution