Building Wikipedia N-grams with Apache Spark.

Armin Esmaeilzadeh Jorge Ramón Fonseca Cacho Kazem Taghva Mina Esmail Zadeh Nojoo Kambar Mahdi Hajiali

Published in: SAI (2) (2022)

Keyphrases

n gram
language model
open source
text classification
bag of words
variable length
part of speech
language independent
language modelling
open source software
language modeling
query expansion
semantic relations
viterbi algorithm
knowledge base
character n grams
databases
document representation
text categorization
image classification
information retrieval systems
neural network