Login / Signup
AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary Alignment for Temporal Referential Dialogue.
Yunlong Tang
Daiki Shimada
Jing Bi
Chenliang Xu
Published in:
CoRR (2024)
Keyphrases
</>
audio visual
temporal context
multi modal
visual information
meeting room
spatio temporal
spatial context
contextual information
multimedia
temporal information
spatial and temporal
multi stream
visual data
video summarization
person authentication
audio visual speech recognition
emotion recognition