WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural Language.
Zhenxiang LinXidong PengPeishan CongYuenan HouXinge ZhuSibei YangYuexin MaPublished in: CoRR (2023)
Keyphrases
- multi modal
- dynamic scenes
- d objects
- visual data
- multi view
- multiple views
- video sequences
- high dimensional
- visual information
- object recognition
- space time
- pose estimation
- three dimensional
- visual features
- video data
- image sequences
- viewpoint
- image data
- image content
- background subtraction
- human motion
- visual content
- range images
- moving objects
- motion segmentation
- machine learning
- multimedia data
- contextual information
- human actions
- motion estimation
- computer vision
- data mining