MAVD: The First Open Large-Scale Mandarin Audio-Visual Dataset with Depth Information.
Jianrong WangYuchen HuoLi LiuTianyi XuQi LiSen LiPublished in: INTERSPEECH (2023)
Keyphrases
- depth information
- audio visual
- emotion recognition
- multi modal
- depth map
- rgbd images
- stereo vision
- multi stream
- visual information
- visual data
- depth recovery
- multimedia
- depth images
- audio visual speech recognition
- three dimensional
- speech recognition
- human computer interaction
- dynamic programming
- hidden markov models