Towards Tokenization and Part-of-Speech Tagging for Khmer: Data and Discussion.
Hour KaingChenchen DingMasao UtiyamaEiichiro SumitaSethserey SamSopheap SengKatsuhito SudohSatoshi NakamuraPublished in: ACM Trans. Asian Low Resour. Lang. Inf. Process. (2021)
Keyphrases
- data sets
- database
- statistical analysis
- data analysis
- data collection
- image data
- original data
- data structure
- raw data
- prior knowledge
- data processing
- data quality
- sensor data
- query expansion
- computer systems
- natural language processing
- small number
- probability distribution
- xml documents
- relational databases
- training data
- clustering algorithm
- website
- information retrieval
- neural network