Multi-Modal and Multi-Scale Temporal Fusion Architecture Search for Audio-Visual Video Parsing.
Jiayi ZhangWeixin LiPublished in: ACM Multimedia (2023)
Keyphrases
- multi modal
- audio visual
- video search
- multimodal fusion
- temporal context
- video summarization
- multiscale
- multi modality
- person authentication
- audio features
- semantic concepts
- spatial and temporal
- spatio temporal
- multiple modalities
- video data
- space time
- visual data
- single modality
- cross modal
- image annotation
- video retrieval
- multimedia
- video content
- temporal information
- video streams
- video sequences
- video frames
- multi stream
- image database
- co occurrence
- high dimensional
- image segmentation