Leveraging auxiliary image descriptions for dense video captioning.
Emre BoranAykut ErdemNazli Ikizler-CinbisErkut ErdemPranava MadhyasthaLucia SpeciaPublished in: Pattern Recognit. Lett. (2021)
Keyphrases
- image data
- input image
- image classification
- multiscale
- image features
- single image
- natural language descriptions
- image representation
- image content
- image analysis
- high resolution
- low level
- image retrieval
- video images
- image description
- image matching
- image pixels
- image frames
- weakly labeled
- spatial information
- segmentation algorithm
- feature points
- test images
- video content
- video retrieval
- pre trained
- image segmentation
- visual cues
- displacement field
- static images
- image processing
- segmentation method
- multimedia
- textual descriptions
- high level
- object motion
- image sequences
- video sequences
- pixel values
- video clips
- video surveillance
- key frames