SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing.
Taku KudoJohn RichardsonPublished in: EMNLP (Demonstration) (2018)
Keyphrases
- language independent
- text processing
- n gram
- natural language processing
- text mining
- machine translation
- text classification
- word level
- neural network
- text retrieval
- information extraction
- parallel corpora
- chinese text retrieval
- information retrieval systems
- cross lingual
- cross language
- databases
- language specific
- question answering
- language model
- feature selection
- field of natural language processing