Login / Signup
Goldfish: Vision-Language Understanding of Arbitrarily Long Videos.
Kirolos Ataallah
Xiaoqian Shen
Eslam Abdelrahman
Essam Sleiman
Mingchen Zhuge
Jian Ding
Deyao Zhu
Jürgen Schmidhuber
Mohamed Elhoseiny
Published in:
CoRR (2024)
Keyphrases
</>
language understanding
natural language understanding
language processing
contextual constraints
spoken dialogue systems
vision system
semantic interpretation
dialogue system
natural language
video analysis
general knowledge
multi agent
low level
video data
human activities