SPGISpeech: 5, 000 Hours of Transcribed Financial Audio for Fully Formatted End-to-End Speech Recognition.
Patrick K. O'NeillVitaly LavrukhinSomshubra MajumdarVahid NorooziYuekai ZhangOleksii KuchaievJagadeesh BalamYuliya DovzhenkoKeenan FreybergMichael D. ShulmanBoris GinsburgShinji WatanabeGeorg KucskoPublished in: Interspeech (2021)
Keyphrases
- end to end
- speech recognition
- speaker identification
- speech processing
- speech recognition technology
- automatic speech recognition
- hidden markov models
- language model
- broadcast news
- speech signal
- spontaneous speech
- multimedia
- speech recognizer
- audio visual speech recognition
- speech synthesis
- cepstral coefficients
- speaker dependent
- congestion control
- visual information
- audio visual
- noisy environments
- pattern recognition
- speaker independent
- machine learning
- speaker recognition
- speech recognition systems
- signal processing
- audio signal
- audio features
- multi stream
- gaussian mixture model