Login / Signup
Incorporating Scene Graphs into Pre-trained Vision-Language Models for Multimodal Open-vocabulary Action Recognition.
Chao Wei
Zhidong Deng
Published in:
ICRA (2024)
Keyphrases
</>
action recognition
language model
pre trained
computer vision
spoken term detection
human actions
n gram
probabilistic model
information retrieval
bag of words
training data
d scene
speech recognition
context sensitive
atomic actions
body parts
video sequences
three dimensional
training examples
control signals
audio visual
image sequences
input image
visual data
multi modal
single image
face recognition
real scenes
object detection
visual words
moving objects
principal component analysis
pose estimation