SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model.
Yi-Jen ShihHsuan-Fu WangHeng-Jui ChangLayne BerryHung-yi LeeDavid HarwathPublished in: CoRR (2022)
Keyphrases
- language model
- pre trained
- speech recognition
- word error rate
- language modeling
- probabilistic model
- automatic speech recognition
- training data
- n gram
- information retrieval
- retrieval model
- document retrieval
- computer vision
- training examples
- speech signal
- mixture model
- query expansion
- ad hoc information retrieval
- test collection
- control signals
- context sensitive
- multimedia
- translation model
- broadcast news
- smoothing methods
- machine learning