Listen, Look and Deliberate: Visual context-aware speech recognition using pre-trained text-video representations.
Shahram GhorbaniYashesh GaurYu ShiJinyu LiPublished in: CoRR (2020)
Keyphrases
- context aware
- pre trained
- training data
- contextual information
- spoken dialogue systems
- context awareness
- mobile devices
- ubiquitous computing
- training examples
- ambient intelligence
- video search
- smart home
- speech recognition
- visual information
- mobile users
- visual data
- control signals
- current context
- context aware systems
- audio visual
- context aware services
- context aware ubiquitous learning
- visual features
- ubiquitous learning
- multi modal
- text mining
- multimedia
- small number
- high level
- computer vision