Transformer-based multimodal feature enhancement networks for multimodal depression detection integrating video, audio and remote photoplethysmograph signals.
Huiting FanXingnan ZhangYingying XuJiangxiong FangShiqing ZhangXiaoming ZhaoJun YuPublished in: Inf. Fusion (2024)
Keyphrases
- multimedia
- story segmentation
- multimodal information
- audio visual
- multimodal fusion
- video data
- cepstral features
- audio signals
- cross modal
- multi modal
- audio video
- multimodal interaction
- broadcast news
- real time
- event detection
- signal processing
- video sequences
- news video
- soccer video
- video files
- scene change detection
- audio features
- detection algorithm
- image processing
- video recordings
- low signal to noise ratio
- video signals
- multi stream
- digital video
- digital audio
- mouth region
- multiple modalities
- signal detection
- video analysis
- object detection
- image features
- face detection and tracking
- feature vectors