The Shape of Word Embeddings: Recognizing Language Phylogenies through Topological Data Analysis.
Ondrej DraganovSteven SkienaPublished in: CoRR (2024)
Keyphrases
- data analysis
- linguistic knowledge
- topological properties
- english text
- language learning
- topological features
- language specific
- natural language
- data collection
- shape model
- high dimensional data
- programming language
- word order
- word sense disambiguation
- chinese text retrieval
- parallel corpus
- co occurrence
- shape representation
- lexical information
- shape descriptors
- n gram
- word meanings
- character n grams
- target language
- word meaning
- topological information
- syntactic categories
- machine translation system
- data mining
- manifold learning
- shape analysis
- vector space
- business intelligence
- word recognition
- recognizing objects
- cross language
- indian languages
- closure operator
- data processing
- reeb graph
- information retrieval