Login / Signup
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs.
Zesen Cheng
Sicong Leng
Hang Zhang
Yifei Xin
Xin Li
Guanzheng Chen
Yongxin Zhu
Wenqi Zhang
Ziyang Luo
Deli Zhao
Lidong Bing
Published in:
CoRR (2024)
Keyphrases
</>
spatial temporal
video shots
spatio temporal
spatial and temporal
multimedia
temporal information
action recognition
spatial information
human actions
audio video
text classification
space time
video retrieval
video analysis