Towards Long Form Audio-visual Video Understanding.
Wenxuan HouGuangyao LiYapeng TianDi HuPublished in: CoRR (2023)
Keyphrases
- audio visual
- video summarization
- visual data
- multi modal
- multimedia
- meeting room
- audio visual content
- temporal context
- video data
- audio features
- multi stream
- visual information
- multimodal fusion
- video sequences
- sports video
- audio visual speech recognition
- person authentication
- video streams
- video frames
- video content
- multimedia data
- human computer interaction
- face recognition
- three dimensional