SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model.
Yi-Jen ShihHsuan-Fu WangHeng-Jui ChangLayne BerryHung-yi LeeDavid HarwathPublished in: SLT (2022)
Keyphrases
- language model
- pre trained
- speech recognition
- word error rate
- language modeling
- n gram
- speech signal
- document retrieval
- training data
- probabilistic model
- information retrieval
- retrieval model
- mixture model
- automatic speech recognition
- computer vision
- training examples
- query expansion
- test collection
- smoothing methods
- control signals
- ad hoc information retrieval
- context sensitive
- data sets
- broadcast news
- feature selection
- learning algorithm
- dirichlet prior