Towards accurate unsupervised video captioning with implicit visual feature injection and explicit.
Yunjie ZhangTianyang XuXiaoning SongXuefeng ZhuZhen-Hua FengXiaojun WuPublished in: Pattern Recognit. Lett. (2024)
Keyphrases
- visual features
- content based video retrieval
- key frames
- semantic concepts
- visual data
- video shots
- motion features
- visual information
- video data
- image classification
- video content
- content based retrieval
- human actions
- video sequences
- video retrieval
- visual content
- low level features
- low level
- video database
- multimedia
- multimedia data
- image retrieval
- video streams
- image annotation
- automatic image annotation
- image search
- web images
- keywords
- video clips
- multi modal
- higher level
- video analysis
- image data
- object recognition
- video frames
- semi supervised
- image collections
- computer vision
- semantic content
- action recognition
- video summarization
- space time
- visual descriptors