Unspoken Sound: Identifying Trends in Non-Speech Audio Captioning on YouTube.
Lloyd MayKeita OhshiroKhang DangSripathi SridharJhanvi PaiMagdalena FuentesSooyeon LeeMark CartwrightPublished in: CHI (2024)
Keyphrases
- audio visual
- audio stream
- audio signal
- acoustic features
- audio signals
- speaker identification
- audio features
- emotion recognition
- cepstral features
- text to speech
- broadcast news
- digital audio
- speech processing
- speech recognition
- audio content
- speech signal
- prosodic features
- sound source
- acoustic signals
- multi modal
- social media
- multi stream
- speech music discrimination
- audio recordings
- user generated
- visual information
- speaker verification
- automatic speech recognition systems
- linear predictive coding
- action recognition
- gaussian mixture model
- visual data
- multimedia
- spoken documents
- recognition engine
- video streams
- visual speech
- visual features
- spontaneous speech
- speaker diarization
- voice activity detection
- automatic speech recognition
- feature set
- music information retrieval