Login / Signup
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding.
Hang Zhang
Xin Li
Lidong Bing
Published in:
CoRR (2023)
Keyphrases
</>
language model
audio visual
multimedia
visual data
video data
video sequences
language modeling
video content
video retrieval
document retrieval
n gram
information retrieval
spatio temporal
speech recognition
probabilistic model
image retrieval
feature space
natural language
machine learning
relevance model