Login / Signup
Audio-Visual Praise Estimation for Conversational Video based on Synchronization-Guided Multimodal Transformer.
Nobukatsu Hojo
Saki Mizuno
Satoshi Kobashikawa
Ryo Masumura
Mana Ihori
Hiroshi Sato
Tomohiro Tanaka
Published in:
INTERSPEECH (2023)
Keyphrases
</>
audio visual
multi modal
multi stream
audio visual speech recognition
visual information
temporal context
visual data
person authentication
multimodal fusion
multimedia
emotion recognition
audio features
natural language processing
image classification
wordnet
pose estimation