Audio-Visual Interpretable and Controllable Video Captioning.
Yapeng TianChenxiao GuanJustin GoodmanMarc MooreChenliang XuPublished in: CVPR Workshops (2019)
Keyphrases
- audio visual
- video summarization
- visual data
- multimedia
- meeting room
- multi modal
- audio features
- audio visual content
- sports video
- visual information
- temporal context
- video sequences
- multi stream
- video data
- video content
- person authentication
- multimodal fusion
- video streams
- video retrieval
- visual features
- video frames
- audio visual speech recognition
- space time
- multimedia data
- data processing
- key frames
- event detection
- metadata