Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models.
Shimin ChenYitian YuanShaoxiang ChenZequn JieLin MaPublished in: CoRR (2024)
Keyphrases
- language model
- video sequences
- language modeling
- video content
- video frames
- probabilistic model
- video data
- n gram
- video analysis
- statistical language models
- video database
- video clips
- document retrieval
- key frames
- speech recognition
- computer vision
- multimedia
- retrieval model
- query expansion
- test collection
- language modelling
- video streams
- human actions
- context sensitive
- video retrieval
- error rate
- hidden markov models
- video search
- smoothing methods
- feature vectors