FILS: Self-Supervised Video Feature Prediction In Semantic Language Space.
Mona AhmadianFrank GuerinAndrew GilbertPublished in: CoRR (2024)
Keyphrases
- space time
- natural language
- prediction accuracy
- semantic concepts
- video data
- multimedia
- high level
- video sequences
- programming language
- video streams
- semantically equivalent
- video event
- semantic representations
- semantic space
- conceptual graphs
- semantic information
- context dependent
- spatio temporal
- search space
- event detection
- semantic annotation
- language learning
- video content
- semantic similarity
- domain specific
- linguistic analysis
- real time
- similarity measure
- semantic labels
- feature space
- semantic structure
- prediction model
- video clips
- semantic web