Improving On-Screen Sound Separation for Open Domain Videos with Audio-Visual Self-attention.
Efthymios TzinisScott WisdomTal RemezJohn R. HersheyPublished in: CoRR (2021)
Keyphrases
- audio visual
- open domain
- sound source
- video summarization
- audio features
- multi modal
- visual data
- information extraction
- visual information
- question answering
- multimedia
- video sequences
- audio visual speech recognition
- multi stream
- video content
- passage retrieval
- video data
- low level
- question answering systems
- image retrieval
- machine learning