Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding.
Dave Zhenyu ChenQirui WuMatthias NießnerAngel X. ChangPublished in: ECCV (32) (2022)
Keyphrases
- visual information
- management system
- speech recognition
- layered architecture
- real time
- audio visual
- software architecture
- pattern recognition
- database systems
- network architecture
- multi agent systems
- visual features
- feature extraction
- computer vision
- visual perception
- stereo correspondence
- speaker verification
- speaker recognition
- data sets