Using audio-visual information to understand speaker activity: Tracking active speakers on and off screen.
Ken HooverSourish ChaudhuriCaroline PantofaruIan SturdyMalcolm SlaneyPublished in: ICASSP (2018)
Keyphrases
- visual information
- audio visual
- visual features
- speech recognition
- visual data
- low level
- visual cues
- video segments
- eye movements
- content based image retrieval systems
- speaker dependent
- real time
- particle filter
- textual information
- image collections
- mean shift
- visual content
- machine learning
- audio features
- object tracking
- eye tracking
- semantic information
- speaker adaptation
- human activities
- visual similarity
- relational databases
- visual input
- high level
- content based image
- information retrieval