Self-Supervised Speech Representations are More Phonetic than Semantic.
Kwanghee ChoiAnkita PasadTomohiko NakamuraSatoru FukayamaKaren LivescuShinji WatanabePublished in: CoRR (2024)
Keyphrases
- speech recognition
- semantic representations
- spoken term detection
- intermediate representations
- higher level
- semantic information
- spoken document retrieval
- speech synthesis
- speech recognizer
- semantic representation
- semantic annotation
- speech sounds
- natural language
- semantic web
- emotion recognition
- emotional speech
- automatic speech recognition
- speaker identification
- speech signal
- domain specific
- semantic similarity
- content based video retrieval
- semantic knowledge
- co occurrence
- hidden markov models
- recognition engine
- speaker independent
- endpoint detection
- symbolic representation
- text to speech
- metadata
- semantic relationships
- semantic analysis
- semantic network
- visual features
- language model
- similarity measure
- high level