MiKASA: Multi-Key-Anchor & Scene-Aware Transformer for 3D Visual Grounding.

Chun-Peng Chang Shaoxiang Wang Alain Pagani Didier Stricker

Published in: CoRR (2024)

Keyphrases

observed scene
visual data
spatial relations
visual scene
visual appearance
three dimensional
scene analysis
real scenes
single image
scene categorization
d scene
spatial layout
visual perception
scene understanding
ground plane
real world objects
dynamic scenes
visual information
high level
visual features
fuzzy logic
visual environment
neural network
high voltage
complex scenes
multiple images
fault diagnosis
video sequences
image sequences
computer vision