Login / Signup
Audio-Visual Grounding Referring Expression for Robotic Manipulation.
Yefei Wang
Kaili Wang
Yi Wang
Di Guo
Huaping Liu
Fuchun Sun
Published in:
CoRR (2021)
Keyphrases
</>
audio visual
manipulation tasks
multi modal
visual information
visual data
temporal context
emotion recognition
multi stream
video summarization
person authentication
multimedia
audio visual speech recognition
feature extraction
low level
activity recognition