All-in-One Transformer: Unifying Speech Recognition, Audio Tagging, and Event Detection.
Niko MoritzGordon WichernTakaaki HoriJonathan Le RouxPublished in: INTERSPEECH (2020)
Keyphrases
- event detection
- speech recognition
- speaker identification
- speech processing
- speech recognition technology
- soccer video
- audio visual speech recognition
- hidden markov models
- language model
- speech signal
- video event detection
- video analysis
- event recognition
- cepstral coefficients
- activity recognition
- speech recognizer
- speech synthesis
- pattern recognition
- multimedia
- noisy environments
- sports video
- automatic speech recognition
- mel frequency cepstral coefficients
- metadata
- audio visual
- speech recognition systems
- visual data
- audio signal
- broadcast news
- signal processing
- visual information
- active database management systems