SViTT-Ego: A Sparse Video-Text Transformer for Egocentric Video.
Hector A. ValdezKyle MinSubarna TripathiPublished in: CoRR (2024)
Keyphrases
- video sequences
- video data
- multimedia
- real time
- video content
- video segments
- video database
- video retrieval
- video clips
- video analysis
- text mining
- video streams
- multimedia data
- video frames
- online video
- natural language descriptions
- spatial and temporal
- key frames
- space time
- multimedia documents
- spatio temporal
- video summarization
- news video
- closed captions