Knowledge Discovery in Documents by Extracting Frequent Word Sequences.
Helena AhonenPublished in: Libr. Trends (1999)
Keyphrases
- knowledge discovery
- word spotting
- word frequencies
- keywords
- information retrieval
- natural language text
- word pairs
- unstructured documents
- time stamped
- printed documents
- text corpus
- information retrieval systems
- word frequency
- latent topics
- frequency counts
- related words
- concept space
- document collections
- linguistic information
- word similarity
- sentence level
- term weighting
- xml documents
- page layout
- spoken documents
- co occurrence
- hidden markov models
- data mining techniques
- web documents
- document retrieval
- text documents
- multiword
- relevant documents
- user queries
- text mining
- topic models
- n gram
- training corpus
- retrieval systems
- document analysis
- word recognition
- term frequency
- data mining
- stop words
- event sequences
- related documents
- character recognition
- document images
- handwritten documents
- word segmentation
- semantic information
- association rules
- document space
- wordnet
- query terms
- data analysis
- vector space model