Bridging Vision and Language from the Video-to-Text Perspective: A Comprehensive Review.
Jesus Perez-MartinBenjamin BustosSilvio Jamil Ferzoli GuimarãesIvan SipiranJorge PérezGrethel Coello SaidPublished in: CoRR (2021)
Keyphrases
- language generation
- real time
- english text
- computational linguistics
- natural language descriptions
- text to speech synthesis
- video content
- key topics
- text detection
- text to speech
- video search
- video data
- multimedia
- information retrieval
- human language
- video segments
- computer vision
- database
- linguistic analysis
- image processing
- english language
- video database
- language learning
- video frames
- video streams
- keywords
- video sequences
- native language
- multimedia search
- programming language
- text generation
- text retrieval
- news video
- multimedia data
- natural language generation
- video clips
- natural language
- language specific
- vision system
- multimedia documents
- text information
- language processing
- visual features
- space time
- video surveillance
- audio content
- key frames
- syntactic categories
- video retrieval