Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization.
Andrea FasoliChia-Yu ChenMauricio J. SerranoSwagath VenkataramaniGeorge SaonXiaodong CuiBrian KingsburyKailash GopalakrishnanPublished in: INTERSPEECH (2022)
Keyphrases
- end to end
- language model
- recurrent neural networks
- language modeling
- n gram
- query expansion
- speech recognition
- neural network
- information retrieval
- probabilistic model
- feed forward
- document retrieval
- language modelling
- test collection
- retrieval model
- ad hoc information retrieval
- artificial neural networks
- recurrent networks
- mixture model
- data fusion
- context sensitive
- congestion control
- smoothing methods
- bayesian networks
- bayesian inference
- information fusion
- translation model
- internet protocol
- relevance model
- dirichlet prior
- fusion method
- hidden markov models
- generative model
- query terms