Text classification with word embedding regularization and soft similarity measure.
Vít NovotnýEniafe Festus AyetiranMichal StefánikPetr SojkaPublished in: CoRR (2020)
Keyphrases
- text classification
- similarity measure
- n gram
- training corpus
- term frequency
- bag of words
- distributional clustering
- word similarity
- feature selection
- machine learning
- mutual information
- semantic features
- text data
- text mining
- text categorization
- sentiment analysis
- data cleaning
- text classifiers
- semantic similarity
- naive bayes
- text documents
- labeled data
- co occurrence
- edit distance
- similarity search
- knn
- distance measure
- multi label
- similarity function
- vector space
- information extraction
- feature vectors
- regularization parameter
- data hiding
- labeled and unlabeled data
- pairwise
- feature set