Faster Transformer Decoding: N-gram Masked Self-Attention.
Ciprian ChelbaMia Xu ChenAnkur BapnaNoam ShazeerPublished in: CoRR (2020)
Keyphrases
- image retrieval
- n gram
- viterbi algorithm
- language model
- finite state transducers
- language independent
- bag of words
- text retrieval
- text classification
- language modeling
- variable length
- language modelling
- inside outside algorithm
- artificial intelligence
- word segmentation
- part of speech
- retrieval model
- fault diagnosis
- information retrieval