Trilingual Semantic Embeddings of Visually Grounded Speech with Self-Attention Mechanisms.
Yasunori OhishiAkisato KimuraTakahito KawanishiKunio KashinoDavid HarwathJames R. GlassPublished in: ICASSP (2020)
Keyphrases
- speech recognition
- natural language
- semantic information
- computational models
- vector space
- semantic similarity
- endpoint detection
- semantic knowledge
- semantic search
- dialogue system
- automatic speech recognition
- speech signal
- high level
- semantic web
- low dimensional
- content based video retrieval
- recognition engine
- speaker identification
- broadcast news
- speech synthesis
- spoken language
- semantic analysis
- semantic description
- noisy environments
- focus of attention
- audio visual
- domain ontology
- higher level
- domain specific
- high dimensional
- pattern recognition