D3Net: A Speaker-Listener Architecture for Semi-supervised Dense Captioning and Visual Grounding in RGB-D Scans.
Dave Zhenyu ChenQirui WuMatthias NießnerAngel X. ChangPublished in: CoRR (2021)
Keyphrases
- semi supervised
- management system
- visual information
- semi supervised learning
- three dimensional
- unlabeled data
- computer vision
- active learning
- co training
- audio visual
- software architecture
- speech recognition
- unsupervised learning
- labeled data
- real time
- multi view
- supervised learning
- pairwise
- visual features
- low level
- subspace learning