Multimodal Pretraining for Dense Video Captioning.
Gabriel HuangBo PangZhenhai ZhuClara RiveraRadu SoricutPublished in: AACL/IJCNLP (2020)
Keyphrases
- multimedia
- video data
- video sequences
- multi modal
- real time
- video content
- story segmentation
- video streams
- multimedia data
- multimodal information
- real time video
- video processing
- digital video
- audio visual
- video clips
- video retrieval
- video surveillance
- video frames
- space time
- video database
- online video
- computer vision
- event detection
- human actions
- temporal information
- video analysis
- human computer interaction
- low level
- video images
- event recognition
- spatio temporal
- computational complexity