Vision-Language Integration in Multimodal Video Transformers (Partially) Aligns with the Brain.
Dota Tianai DongMariya TonevaPublished in: CoRR (2023)
Keyphrases
- real time
- multimedia
- video sequences
- natural language
- video data
- video streams
- human brain
- video content
- image processing
- story segmentation
- closed world
- data integration
- computer vision
- video clips
- vision system
- video frames
- video surveillance
- programming language
- multi modal
- video database
- multimodal interfaces
- object oriented
- multimedia data
- language learning
- space time
- magnetic resonance images
- video search
- brain tumors
- multimodal interaction
- external world