Vista-LLaMA: Reliable Video Narrator via Equal Distance to Visual Tokens.
Fan MaXiaojie JinHeng WangYuchen XianJiashi FengYi YangPublished in: CoRR (2023)
Keyphrases
- learning perl
- visual data
- video data
- video sequences
- visual information
- visual cues
- visual analysis
- video frames
- multimedia
- operating system
- distance measure
- video database
- video streams
- video indexing
- video indexing and retrieval
- video content
- video clips
- video search
- distance function
- content based video retrieval
- news video
- visual input
- visual features
- real time
- computer vision
- space time
- video analysis
- key frames
- temporal information
- visual similarity
- euclidean distance
- low level
- video shots
- multimedia data
- spatial and temporal
- event recognition
- distance metric
- general purpose
- high level