YODAS: Youtube-Oriented Dataset for Audio and Speech.
Xinjian LiShinnosuke TakamichiTakaaki SaekiWilliam ChenSayaka ShiotaShinji WatanabePublished in: CoRR (2024)
Keyphrases
- audio stream
- audio visual
- audio signals
- text to speech
- broadcast news
- speaker identification
- digital audio
- cepstral features
- audio recordings
- speech processing
- emotion recognition
- multimedia
- speech recognition
- multi modal
- video search
- automatic transcription
- audio video
- action recognition
- acoustic signals
- prosodic features
- multi stream
- audio features
- visual information
- visual data
- language model
- linear predictive coding
- acoustic features
- digital video
- audio signal
- human actions
- signal processing
- speech signal
- automatic speech recognition