Yodas: Youtube-Oriented Dataset for Audio and Speech.
Xinjian LiShinnosuke TakamichiTakaaki SaekiWilliam ChenSayaka ShiotaShinji WatanabePublished in: ASRU (2023)
Keyphrases
- audio visual
- audio stream
- broadcast news
- audio signals
- speaker identification
- text to speech
- emotion recognition
- cepstral features
- audio features
- multimedia
- speech recognition
- speech processing
- speech music discrimination
- video search
- digital audio
- social media
- visual information
- feature set
- linear predictive coding
- benchmark datasets
- acoustic signals
- low level
- action recognition
- audio recordings
- signal processing
- speech signal
- content based video retrieval
- speech synthesis
- human actions
- web videos
- feature selection
- hidden markov models
- multi modal
- spoken documents
- image retrieval
- multimodal interfaces
- automatic transcription
- audio video
- concept detection
- video streams
- user generated
- digital video
- visual data