A multi-purpose audio-visual corpus for multi-modal Persian speech recognition: The Arman-AV dataset.
Javad PeymanfardSamin HeydarianAli LashiniHossein ZeinaliMohammad Reza MohammadiNasser MozayaniPublished in: Expert Syst. Appl. (2024)
Keyphrases
- audio visual
- multi modal
- speech recognition
- audio visual speech recognition
- multi stream
- language model
- hidden markov models
- noisy environments
- pattern recognition
- speech signal
- speaker verification
- automatic speech recognition
- digit recognition
- audio features
- image annotation
- text classification
- video search
- sound source
- feature set
- similarity measure
- information retrieval