Multimodal Video Captioning using Object-Auditory Information Fusion with Transformers.
Berkay SelbesMustafa SertPublished in: NarSUM@MM (2023)
Keyphrases
- information fusion
- data fusion
- multimedia
- fusion algorithm
- information gathering
- soft computing
- video sequences
- multi source
- decision level
- multi modal
- fusion model
- video data
- cross modal
- multi sensor information fusion
- real time
- fusion method
- video frames
- event extraction
- fusion methods
- intelligent systems
- artificial neural networks
- artificial intelligence