Representation of Hypertext Documents Based on Terms, Links and Text Compressibility.
Julian SzymanskiWlodzislaw DuchPublished in: ICONIP (1) (2010)
Keyphrases
- text representation
- text documents
- text collections
- information retrieval
- web documents
- document analysis
- related documents
- plain text
- index terms
- document set
- free text
- vector space model
- text mining
- keywords
- digital documents
- text retrieval
- text data
- wikipedia articles
- text content
- document collections
- text segments
- document representation
- semantically related
- document content
- metadata
- related words
- document level
- text information
- textual content
- textual features
- natural language text
- bag of words
- textual data
- document type
- semantic representation
- information extraction
- semantic information
- linguistic information
- text corpus
- term frequency
- controlled vocabulary
- plagiarism detection
- document categorization
- textual descriptions
- text summarization
- electronic documents
- text categorization
- latent semantic analysis
- multimedia documents
- link analysis
- xml documents
- stop words
- word frequency
- co occurrence
- information retrieval systems
- wordnet
- automatic summarization
- topic models
- document retrieval