Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size.
Davis YoshidaAllyson EttingerKevin GimpelPublished in: CoRR (2020)
Keyphrases
- hidden markov models
- computational complexity
- contextual information
- context aware
- neural network
- scales linearly
- maintenance cost
- maximum number
- highly efficient
- improved algorithm
- statistical methods
- computational efficiency
- artificial neural networks
- relational databases
- natural language
- database systems
- genetic algorithm