Building a Wikipedia N-GRAM Corpus.
Jorge Ramón Fonseca CachoBen CisnerosKazem TaghvaPublished in: IntelliSys (2) (2020)
Keyphrases
- n gram
- language model
- bag of words
- language independent
- text classification
- variable length
- wikipedia articles
- language modeling
- named entity disambiguation
- viterbi algorithm
- language modelling
- part of speech
- natural language text
- knowledge base
- semantic relations
- web documents
- out of vocabulary
- statistical language modeling
- neural network
- question answering
- word segmentation
- wordnet
- word level
- text mining
- inside outside algorithm