Integrating Both Visual and Audio Cues for Enhanced Video Caption.
Wang-Li HaoZhaoxiang ZhangHe GuanPublished in: AAAI (2018)
Keyphrases
- visual cues
- visual information
- visual data
- visual features
- news video
- video indexing and retrieval
- story segmentation
- multimedia
- content based video retrieval
- audio video
- mid level
- video retrieval
- video shots
- video data
- video database
- low level
- audio visual
- multimodal fusion
- cross modal
- video content
- digital video
- video sequences
- visual content
- lecture videos
- audio features
- scene change detection
- visual analysis
- video indexing
- audio files
- multimedia processing
- broadcast news
- video content analysis
- video search
- video frames
- video streams
- visual patterns
- concept detectors
- multimedia data
- key frames
- multi modal
- digital audio
- video clips
- semantic concepts
- temporal information
- multiple modalities
- multimodal information
- video material
- video recordings
- video signals
- video analysis
- multimedia information
- audio signals
- lifelog
- video scene
- audio stream
- signal processing
- high level
- temporal segmentation
- audio visual content
- image retrieval
- video segments
- image classification
- human actions
- eye movements
- low level features
- soccer video
- event detection