Login / Signup
Answering Diverse Questions via Text Attached with Key Audio-Visual Clues.
Qilang Ye
Zitong Yu
Xin Liu
Published in:
CoRR (2024)
Keyphrases
</>
audio visual
multi modal
audio visual speech recognition
multi stream
visual information
multimedia
visual data
audio features
database
temporal context
information retrieval
keywords
text documents
visual features
high dimensional
person authentication
high level