MiKASA: Multi-Key-Anchor & Scene-Aware Transformer for 3D Visual Grounding.
Chun-Peng ChangShaoxiang WangAlain PaganiDidier StrickerPublished in: CoRR (2024)
Keyphrases
- observed scene
- visual data
- spatial relations
- visual scene
- visual appearance
- three dimensional
- scene analysis
- real scenes
- single image
- scene categorization
- d scene
- spatial layout
- visual perception
- scene understanding
- ground plane
- real world objects
- dynamic scenes
- visual information
- high level
- visual features
- fuzzy logic
- visual environment
- neural network
- high voltage
- complex scenes
- multiple images
- fault diagnosis
- video sequences
- image sequences
- computer vision