Login / Signup
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding.
Hang Zhang
Xin Li
Lidong Bing
Published in:
EMNLP (Demos) (2023)
Keyphrases
</>
language model
audio visual
multimedia
visual data
language modeling
video data
video sequences
n gram
speech recognition
information retrieval
probabilistic model
document retrieval
passage retrieval
key frames
mixture model
multi modal
retrieval model
space time
relational databases