Audio-Visual Talker Localization in Video for Spatial Sound Reproduction.
Davide BerghiPhilip J. B. JacksonPublished in: CoRR (2024)
Keyphrases
- audio visual
- sound source
- video summarization
- visual data
- multimedia
- meeting room
- multi modal
- audio features
- spatial and temporal
- audio visual content
- visual information
- video data
- space time
- video sequences
- temporal context
- spatial information
- spatial data
- multimodal fusion
- spatio temporal
- multi stream
- person authentication
- audio visual speech recognition
- video streams
- video frames
- spatial context
- spatial relationships
- contextual information
- audio signal
- spatial relations
- multimedia databases
- human actions
- video content
- multimedia data
- hidden markov models
- high dimensional
- image sequences