HEAR: Hearing Enhanced Audio Response for Video-grounded Dialogue.
Sunjae YoonDahyun KimEunseop YoonHee Suk YoonJunyeong KimChang Dong YooPublished in: EMNLP (Findings) (2023)
Keyphrases
- multimedia
- audio video
- scene change detection
- multimedia processing
- visual data
- digital video
- video content analysis
- audio signals
- multimedia information
- audio features
- video recordings
- video content
- video data
- video streams
- digital audio
- video files
- video sequences
- video material
- audio files
- content based video retrieval
- video analysis
- video copy detection
- audio visual
- media streams
- audio stream
- broadcast news
- video signals
- real time
- video scene
- signal processing
- space time
- soccer video
- lecture videos
- audio visual content
- sign language
- online video
- video database
- natural language
- visual features
- multimedia data
- video frames
- metadata
- closed captions
- audio content
- visual information
- event detection
- content analysis
- video surveillance
- human machine
- dialogue system