A Multi-Purpose Audio-Visual Corpus for Multi-Modal Persian Speech Recognition: the Arman-AV Dataset.
Javad PeymanfardSamin HeydarianAli LashiniHossein ZeinaliMohammad Reza MohammadiNasser MozayaniPublished in: CoRR (2023)
Keyphrases
- audio visual
- multi modal
- speech recognition
- audio visual speech recognition
- multi stream
- hidden markov models
- language model
- pattern recognition
- automatic speech recognition
- audio features
- digit recognition
- speaker identification
- speech signal
- video search
- high dimensional
- feature selection
- probabilistic model
- image annotation
- feature set
- text classification
- neural network