Using Sequences of Words for Non-Disjoint Grouping of Documents.
Chiheb-Eddine Ben N'cirNadia EssoussiPublished in: Int. J. Pattern Recognit. Artif. Intell. (2015)
Keyphrases
- text documents
- word spotting
- keywords
- document representation
- word frequencies
- related words
- text corpus
- multiword
- index terms
- semantic relationships
- document content
- time stamped
- document collections
- word frequency
- topic hierarchy
- information retrieval
- linguistic information
- text corpora
- hidden markov models
- person names
- document level
- text mining
- document retrieval
- document clustering
- information retrieval systems
- related documents
- training documents
- word pairs
- pairwise
- printed documents
- metadata
- relevant documents
- web documents
- n gram
- sentiment polarity
- pre classified
- textual features
- historical documents
- semantically related
- xml documents
- natural language text
- word segmentation
- latent topics
- keyword extraction
- term frequency
- bag of words
- word co occurrence
- topic models
- training corpus
- wikipedia articles
- retrieval systems
- stop words
- user queries
- semantic information
- text classification
- information extraction