Multimodal graph neural network for video procedural captioning.
Lei JiRong-Cheng TuKevin LinLijuan WangNan DuanPublished in: Neurocomputing (2022)
Keyphrases
- neural network
- multimedia
- video content
- graph representation
- back propagation
- genetic algorithm
- video streams
- video frames
- video data
- video analysis
- video sequences
- random walk
- multi modal
- structured data
- real time
- graph structure
- graph theory
- neural nets
- video clips
- multimodal information
- directed graph
- artificial neural networks
- key frames
- temporal information
- neural network model
- graph matching
- spatial and temporal
- space time
- directed acyclic graph
- spanning tree
- network architecture
- graph model
- computer vision
- story segmentation
- multimodal interaction
- graph partitioning
- audio visual
- visual data
- human computer interaction
- self organizing maps
- bp neural network
- human activities
- recurrent neural networks