Vision-Text Cross-Modal Fusion for Accurate Video Captioning.
Kaouther OuennicheRuxandra TapuTitus B. ZahariaPublished in: IEEE Access (2023)
Keyphrases
- text data
- cross modal
- visual data
- text mining
- text classification
- multi modal
- text documents
- multiple modalities
- multimedia retrieval
- image retrieval
- keywords
- video data
- multimedia
- video frames
- visual recognition
- semantic concepts
- web pages
- video content
- video analysis
- video streams
- multimedia data
- space time
- computer vision
- video clips
- visual similarity
- video sequences
- video retrieval
- event detection
- information retrieval