Login / Signup
OLViT: Multi-Modal State Tracking via Attention-Based Embeddings for Video-Grounded Dialog.
Adnen Abdessaied
Manuel von Hochmeister
Andreas Bulling
Published in:
CoRR (2024)
Keyphrases
</>
multi modal
video search
semantic concepts
multi modality
audio visual
high dimensional
video sequences
cross modal
video frames
video content
multiple modalities
image annotation
object tracking
state space
appearance model
video analysis
medical imaging
low dimensional
image classification
image analysis