Improving Audio-Visual Video Parsing with Pseudo Visual Labels.
Jinxing ZhouDan GuoYiran ZhongMeng WangPublished in: CoRR (2023)
Keyphrases
- audio visual
- visual data
- visual information
- video summarization
- meeting room
- multimedia
- video data
- visual features
- audio visual content
- sports video
- video sequences
- audio features
- multi modal
- person authentication
- temporal context
- temporal segmentation
- multimodal fusion
- multi stream
- multimedia data
- video content
- human actions
- training data
- natural language
- image sequences
- audio visual speech recognition
- eye movements
- key frames
- image data
- video frames
- video streams
- temporal information
- contextual information
- natural language processing
- space time
- high dimensional
- data analysis
- human motion