Listen, Look and Deliberate: Visual Context-Aware Speech Recognition Using Pre-Trained Text-Video Representations.
Shahram GhorbaniYashesh GaurYu ShiJinyu LiPublished in: SLT (2021)
Keyphrases
- context aware
- pre trained
- contextual information
- training data
- spoken dialogue systems
- ubiquitous computing
- context awareness
- video search
- mobile devices
- ambient intelligence
- training examples
- visual features
- ubiquitous learning
- context aware systems
- speech recognition
- current context
- visual data
- smart home
- audio visual
- context aware services
- mobile users
- visual information
- control signals
- context aware ubiquitous learning
- smart spaces
- text mining
- multimedia
- key frames
- user context
- augmented reality
- digital libraries
- video sequences
- high level
- context aware mobile