Fusing Multi-Level Features from Audio and Contextual Sentence Embedding from Text for Interview-Based Depression Detection.
Junqi XueRuihan QinXinxu ZhouHonghai LiuMin ZhangZhiguo ZhangPublished in: ICASSP (2024)
Keyphrases
- multimedia
- contextual features
- co occurrence
- text graphics
- text summarization
- audio visual
- false positives
- semantic information
- image features
- feature extraction
- feature set
- object detection
- detection algorithm
- feature vectors
- feature space
- lexical features
- text generation
- sentence level
- text to speech
- information retrieval
- discourse structure
- emotion recognition
- noun phrases
- keywords
- sentiment classification
- contextual information