Enhancing the alignment between target words and corresponding frames for video captioning.
Yunbin TuChang ZhouJunjun GuoShengxiang GaoZhengtao YuPublished in: Pattern Recognit. (2021)
Keyphrases
- video frames
- key frames
- temporal filtering
- video segments
- video data
- temporal coherence
- video sequences
- successive frames
- input video
- single frame
- video images
- word alignment
- video streams
- video segmentation
- video content
- image frames
- motion features
- multimedia
- space time
- video analysis
- word level
- real time
- video retrieval
- pre trained
- video clips
- keywords
- spatial and temporal
- shot boundary detection
- multi frame
- image alignment
- temporal order
- video signals
- super resolution
- semantic labels
- reference frame
- frame rate
- video summaries
- parallel texts
- temporal domain
- video scene
- word segmentation
- video summarization
- video shots
- video surveillance
- machine translation
- text documents
- n gram
- visual features
- moving objects