Replacing Human Audio with Synthetic Audio for On-device Unspoken Punctuation Prediction.
Daria SobolevaOndrej SkopekMárius SajgalíkVictor CarbuneFelix WeissenbergerJulia ProskurniaBogdan PrisacariDaniel ValcarceJustin LuRohit PrabhavalkarBalint MiklosPublished in: CoRR (2020)
Keyphrases
- multimedia
- prediction accuracy
- signal processing
- audio stream
- human language
- audio signals
- audio visual
- visual information
- visual data
- cepstral features
- audio video
- cross modal
- multimedia information
- genetic algorithm
- digital video
- music scores
- high level
- emotion recognition
- digital audio
- audio files
- text to speech
- video sequences
- artificial intelligence