AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset.
Zhe ChenHeyang LiuWenyi YuGuangzhi SunHongcheng LiuJi WuChao ZhangYu WangYanfeng WangPublished in: CoRR (2024)
Keyphrases
- audio visual
- multi modal
- multimedia
- visual information
- visual data
- multi stream
- video summarization
- multimodal fusion
- emotion recognition
- temporal context
- person authentication
- computer vision
- audio visual speech recognition
- feature extraction
- co occurrence
- three dimensional
- human computer interaction
- audio features