ThaiLMCut: Unsupervised Pretraining for Thai Word Segmentation.
Suteera SeehaIvan BilanLiliana Mamani SánchezJohannes HuberMichael MatuschekHinrich SchützePublished in: LREC (2020)
Keyphrases
- word segmentation
- pos tagging
- chinese text retrieval
- n gram
- word recognition
- handwriting recognition
- chinese word segmentation
- chinese text
- unsupervised learning
- language independent
- semi supervised
- document analysis
- cross lingual
- language modeling
- text classification
- semi supervised learning
- word level
- unknown words
- supervised learning
- part of speech
- language model
- neural network