VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning.
Kashu YamazakiKhoa VoSang TruongBhiksha RajNgan LePublished in: CoRR (2022)
Keyphrases
- fuzzy logic
- power transformers
- visual cues
- video sequences
- fault diagnosis
- power system
- real time
- high voltage
- visual information
- multimedia
- distribution network
- video content
- incipient fault
- partial discharge
- visual data
- video data
- visual analysis
- video streams
- temporal information
- low level
- video retrieval
- linguistic features
- visual features
- visual perception
- video search
- human activities
- high level
- content based video retrieval
- natural language processing
- event recognition
- video surveillance
- video frames
- video database
- text classification
- visual input
- video clips
- multimedia data
- artificial intelligence