Unsupervised Text Normalization Approach for Morphological Analysis of Blog Documents.
Kazushi IkedaTadashi YanagiharaKazunori MatsumotoYasuhiro TakishimaPublished in: Australasian Conference on Artificial Intelligence (2009)
Keyphrases
- morphological analysis
- text documents
- digital documents
- free text
- information retrieval
- web documents
- plagiarism detection
- keywords
- text retrieval
- text collections
- latent semantic analysis
- document analysis
- improve retrieval effectiveness
- document content
- document categorization
- textual content
- unknown words
- text mining
- character n grams
- document level
- electronic documents
- text data
- document collections
- multimedia documents
- natural language text
- topic modeling
- multiword
- printed documents
- relevant documents
- information extraction
- text corpus
- document retrieval
- text categorization
- document clustering
- writing style
- sentence level
- text classifiers
- information retrieval systems
- text classification
- related words
- text representation
- digital libraries
- xml documents
- natural language processing
- index terms
- semi supervised
- blog entries
- news stories
- document representation
- syntactic categories
- search engine
- test collection
- web pages
- handwritten documents
- social media
- blog posts
- news articles
- retrieval systems