CLIP4VideoCap: Rethinking Clip for Video Captioning with Multiscale Temporal Fusion and Commonsense Knowledge.
Tanvir MahmudFeng LiangYaling QingDiana MarculescuPublished in: ICASSP (2023)
Keyphrases
- video clips
- commonsense knowledge
- multiscale
- temporal information
- video data
- key frames
- video frames
- video content
- commonsense reasoning
- video streams
- language understanding
- information gathering
- video sequences
- temporal data
- information fusion
- machine learning
- temporal reasoning
- automated reasoning
- data fusion
- temporal databases
- knowledge representation