Word-based Compression Methods for Large Text Documents.
Jiri DvorskýJaroslav PokornýVáclav SnáselPublished in: Data Compression Conference (1999)
Keyphrases
- text documents
- text corpus
- keywords
- term frequency
- text classification
- text mining
- latent topics
- wordnet
- co occurrence
- topic models
- text categorization
- information extraction
- n gram
- bag of words
- document classification
- news articles
- named entities
- tf idf
- document clustering
- text data
- automatic text categorization
- document representation
- text corpora
- word sense disambiguation
- extraction patterns
- machine learning
- text collections
- natural language text
- object recognition
- information extraction systems