Visual Grounding in Video for Unsupervised Word Translation.
Gunnar A. SigurdssonJean-Baptiste AlayracAida NematzadehLucas SmairaMateusz MalinowskiJoão CarreiraPhil BlunsomAndrew ZissermanPublished in: CoRR (2020)
Keyphrases
- visual data
- visual cues
- unsupervised manner
- visual analysis
- statistical machine translation
- video sequences
- multimedia
- machine translation system
- content based video retrieval
- parallel corpus
- semi supervised
- video analysis
- visual perception
- video content
- video data
- visual information
- video streams
- multimedia data
- video search
- low level
- co occurrence
- translation model
- unsupervised learning
- target language
- pointwise mutual information
- visual features
- machine translation
- video clips
- real time
- english words
- lexical semantics
- video frames
- key frames
- video retrieval
- video database
- event detection
- word sense disambiguation
- n gram
- visual concepts
- visual input
- keywords
- high level
- video shots
- syntactic categories
- word meanings