• search
    search
  • reviewers
    reviewers
  • feeds
    feeds
  • assignments
    assignments
  • settings
  • logout

CREMA: Multimodal Compositional Video Reasoning via Efficient Modular Adaptation and Fusion.

Shoubin YuJaehong YoonMohit Bansal
Published in: CoRR (2024)
Keyphrases
  • video sequences
  • multi modal
  • video data
  • video streams
  • spatio temporal
  • knowledge representation
  • motion estimation
  • space time
  • data fusion
  • qualitative reasoning