DialogMCF: Multimodal Context Flow for Audio Visual Scene-Aware Dialog.
Zhe ChenHongcheng LiuYu WangPublished in: IEEE ACM Trans. Audio Speech Lang. Process. (2024)
Keyphrases
- visual scene
- visual information
- audio visual
- multimedia
- object recognition
- vision system
- complex scenes
- natural scenes
- multi modal
- spatial relations
- natural language
- contextual information
- computer vision
- three dimensional
- visual attention
- visual features
- natural images
- spatial information
- context aware
- real time
- higher order
- low level
- xml documents
- machine learning