Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing.
Viet Anh TrinhRosy SouthwellYiwen GuanXinlu HeZhiyong WangJacob WhitehillPublished in: CoRR (2024)
Keyphrases
- language model
- speech processing
- speech recognition
- language modeling
- n gram
- information retrieval
- document retrieval
- retrieval model
- probabilistic model
- signal processing
- speaker identification
- multimedia systems
- test collection
- query expansion
- multi modal
- automatic speech recognition
- speech signal
- mixture model
- noisy environments
- query terms
- natural language processing
- handwriting recognition
- computer vision
- pseudo relevance feedback
- relevance model
- multimedia
- translation model
- generative model
- english text
- machine learning
- gaussian mixture model
- image processing
- pattern recognition