MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing.
Jiashuo YuYing ChengRui-Wei ZhaoRui FengYuejie ZhangPublished in: CoRR (2021)
Keyphrases
- audio visual
- video summarization
- visual data
- multi modal
- multimedia
- multimodal fusion
- audio features
- audio visual content
- temporal context
- visual information
- multi stream
- sports video
- multiscale
- event detection
- video sequences
- video data
- audio visual speech recognition
- human actions
- video streams
- input image
- video content
- natural language processing
- natural language
- video analysis
- human activities
- search engine
- human computer interaction
- visual features
- data analysis
- high level
- feature selection