Supporting Trainset Annotation for Text Classification of Incoming Enterprise Documents.
Juris RatsInguna PedePublished in: DATA (2022)
Keyphrases
- text classification
- text documents
- document classification
- labeled documents
- text classifiers
- text data
- training documents
- document categorization
- term frequency
- unlabeled documents
- metadata
- text categorization
- bag of words
- training corpus
- metadata creation
- text mining
- feature engineering
- classify documents
- topic discovery
- naive bayes
- feature selection
- n gram
- information retrieval
- image annotation
- semantic features
- automatic text classification
- machine learning
- active learning
- distributional clustering
- document collections
- document representation
- sentiment analysis
- information retrieval systems
- web documents
- multi label
- semantic annotation
- knn
- document clustering
- relevant documents
- information management
- data cleaning
- labeled data
- keywords
- image retrieval
- sentiment classification
- enterprise search
- xml documents
- document retrieval
- semi supervised
- language modeling
- wordnet
- effective retrieval
- named entities
- vector space model