Listen Then See: Video Alignment with Speaker Attention.
Aviral AgrawalCarlos Mateo Samudio LezcanoIqui Balam Heredia-MarinPrabhdeep Singh SethiPublished in: CoRR (2024)
Keyphrases
- video data
- video sequences
- video content
- multimedia
- video streams
- video frames
- real time video
- key frames
- video database
- real time
- video clips
- speaker verification
- video processing
- image alignment
- audio visual
- video analysis
- video retrieval
- video surveillance
- visual attention
- automatic speech recognition
- sequence alignment
- video search
- visual saliency
- space time
- speaker recognition
- feature extraction
- combining information from multiple