SPEECH-COCO: 600k Visually Grounded Spoken Captions Aligned to MSCOCO Data Set.
William HavardLaurent BesacierOlivier RosecPublished in: CoRR (2017)
Keyphrases
- speech recognition
- data sets
- spoken language
- automatic speech recognition
- broadcast news
- spoken words
- speech signal
- visual features
- spoken documents
- conversational speech
- training data
- hidden markov models
- spontaneous speech
- speech synthesis
- language understanding
- dialogue system
- real world
- language model
- language processing
- benchmark data sets
- news video
- speaker identification
- spoken document retrieval
- original data
- human computer interaction
- speech retrieval
- speech sounds
- database
- information retrieval
- spoken dialogue systems
- recognition engine
- pattern recognition