Building Wikipedia N-grams with Apache Spark.
Armin EsmaeilzadehJorge Ramón Fonseca CachoKazem TaghvaMina Esmail Zadeh Nojoo KambarMahdi HajialiPublished in: SAI (2) (2022)
Keyphrases
- n gram
- language model
- open source
- text classification
- bag of words
- variable length
- part of speech
- language independent
- language modelling
- open source software
- language modeling
- query expansion
- semantic relations
- viterbi algorithm
- knowledge base
- character n grams
- databases
- document representation
- text categorization
- image classification
- information retrieval systems
- neural network