Login / Signup
CREMA: Multimodal Compositional Video Reasoning via Efficient Modular Adaptation and Fusion.
Shoubin Yu
Jaehong Yoon
Mohit Bansal
Published in:
CoRR (2024)
Keyphrases
</>
video sequences
multi modal
video data
video streams
spatio temporal
knowledge representation
motion estimation
space time
data fusion
qualitative reasoning