One For All: Video Conversation is Feasible Without Video Instruction Tuning.
Ruyang LiuChen LiYixiao GeYing ShanThomas H. LiGe LiPublished in: CoRR (2023)
Keyphrases
- optimal solution
- multimedia
- video sequences
- video data
- objective function
- video streams
- video frames
- video content
- real time
- spatial and temporal
- online video
- video surveillance
- video images
- video segmentation
- dynamic textures
- video clips
- real time video
- event detection
- multi agent
- video retrieval
- key frames
- video analysis
- video database
- spatio temporal
- natural language
- video processing
- event recognition
- neural network
- data sets