MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens.
Kirolos AtaallahXiaoqian ShenEslam AbdelrahmanEssam SleimanDeyao ZhuJian DingMohamed ElhoseinyPublished in: CoRR (2024)
Keyphrases
- multimedia
- video data
- video sequences
- visual analysis
- video content
- real time
- video streams
- video retrieval
- visual cues
- video frames
- spatial and temporal
- video database
- digital video
- real time video
- broadcast news
- news video
- video images
- story segmentation
- video search
- visual data
- multi modal
- video analysis
- video indexing
- multimedia data
- visual features
- online video