Text Classification in the Wild: a Large-scale Long-tailed Name Normalization Dataset.
Jiexing QiShuhao LiZhixin GuoYusheng HuangChenghu ZhouWeinan ZhangXinbing WangZhouhan LinPublished in: CoRR (2023)
Keyphrases
- text classification
- text classification tasks
- text categorization
- feature selection
- bag of words
- small scale
- text mining
- benchmark datasets
- machine learning
- sentiment analysis
- real life
- text documents
- naive bayes
- preprocessing
- n gram
- multi label
- labeled data
- text classifiers
- real world
- text data
- data cleaning
- database
- decision trees
- maximum likelihood
- feature set
- web scale
- normalization method
- million images