SPGISpeech: 5, 000 hours of transcribed financial audio for fully formatted end-to-end speech recognition.
Patrick K. O'NeillVitaly LavrukhinSomshubra MajumdarVahid NorooziYuekai ZhangOleksii KuchaievJagadeesh BalamYuliya DovzhenkoKeenan FreybergMichael D. ShulmanBoris GinsburgShinji WatanabeGeorg KucskoPublished in: CoRR (2021)
Keyphrases
- end to end
- speech recognition
- speaker identification
- speech processing
- speech recognition technology
- hidden markov models
- audio visual speech recognition
- speech synthesis
- language model
- speech signal
- speech recognizer
- pattern recognition
- automatic speech recognition
- cepstral coefficients
- signal processing
- congestion control
- multimedia
- noisy environments
- mel frequency cepstral coefficients
- audio visual
- neural network
- speaker recognition
- visual information
- feature extraction
- speaker independent
- speaker dependent
- computer vision
- machine learning