VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning.
Kashu YamazakiKhoa VoQuang Sang TruongBhiksha RajNgan LePublished in: AAAI (2023)
Keyphrases
- fuzzy logic
- power transformers
- fault diagnosis
- visual cues
- distribution network
- visual data
- video sequences
- visual information
- video streams
- linguistic features
- power system
- multimedia
- partial discharge
- video data
- video content
- video analysis
- visual analysis
- real time
- incipient fault
- high voltage
- video indexing
- natural language
- video database
- natural language processing
- video retrieval
- video search
- video indexing and retrieval
- visual features
- semantic labels
- content based video retrieval
- video clips
- visual perception
- digital video
- key frames
- space time
- image quality
- expert systems
- neural network