Audio-Visual LLM for Video Understanding.
Fangxun ShuLei ZhangHao JiangCihang XiePublished in: CoRR (2023)
Keyphrases
- audio visual
- video summarization
- visual data
- multimedia
- meeting room
- multi modal
- audio visual content
- audio features
- temporal context
- sports video
- visual information
- multi stream
- person authentication
- multimodal fusion
- video sequences
- video data
- video content
- temporal information
- video frames
- video streams
- spatio temporal
- audio visual speech recognition
- multimedia data
- contextual information
- domain knowledge
- image sequences