Automatic Chinese Text Classification Using Character-Based and Word-Based Approach.
Xi LuoWataru OhyamaTetsushi WakabayashiFumitaka KimuraPublished in: ICDAR (2013)
Keyphrases
- text classification
- word segmentation
- n gram
- chinese text
- term frequency
- training corpus
- chinese word segmentation
- word recognition
- chinese characters
- text categorization
- bag of words
- distributional clustering
- unknown words
- text data
- sentiment analysis
- data cleaning
- labeled data
- machine learning
- english chinese
- neural network
- language modeling
- feature selection
- keywords
- text mining
- co occurrence
- text classifiers
- text documents
- writing style
- statistical machine translation
- printed text
- writing styles