Exploring Word Embeddings and Character N-Grams for Author Clustering.
Yunita SariMark StevensonPublished in: CLEF (Working Notes) (2016)
Keyphrases
- character n grams
- n gram
- variable length
- clustering algorithm
- cross language information retrieval
- clustering method
- high dimensional data
- cross language
- k means
- optical character recognition
- co occurrence
- dimensionality reduction
- language specific
- natural language processing
- language model
- machine translation
- digital libraries
- word segmentation
- search engine