TIMIT-TTS: a Text-to-Speech Dataset for Multimodal Synthetic Media Detection.
Davide SalviBrian HoslerPaolo BestaginiMatthew C. StammStefano TubaroPublished in: CoRR (2022)
Keyphrases
- text to speech
- multimodal interaction
- speech synthesis
- text to speech synthesis
- prosodic features
- object detection
- detection method
- hidden markov models
- false alarms
- multimedia
- speech corpus
- word processing
- programming tool
- english text
- detection algorithm
- object detectors
- detection rate
- detection accuracy
- neural network
- automatic detection
- false positives
- multi modal
- real world
- digital media