Applicability of Text-representing Centroids for Thai Language Documents.
Sureeporn NualnimNirach RomyenMaleerat SodanilPublished in: NLPIR (2019)
Keyphrases
- text documents
- linguistic analysis
- information retrieval
- digital documents
- multilingual documents
- indian languages
- web documents
- text analysis
- keywords
- text information
- text data
- free text
- computational linguistics
- language generation
- text mining
- textual content
- newspaper articles
- document content
- document processing
- text retrieval
- latent semantic analysis
- document analysis
- plagiarism detection
- textual data
- text content
- natural language text
- electronic documents
- character n grams
- text collections
- textual information
- english text
- natural language
- automatic categorization
- source language
- semantic information
- document clustering
- multimedia documents
- k means
- document collections
- document categorization
- text corpus
- xml documents
- information retrieval systems
- metadata
- handwritten documents
- printed documents
- arabic language
- key concepts
- information extraction
- multiword
- scientific literature
- word segmentation
- sentence level
- text corpora
- page layout
- handwritten text
- clustering algorithm
- document level
- document representation
- text classification
- natural language processing
- logical structure
- relevant documents
- text classifiers
- data points
- digital libraries
- language model
- wordnet
- cross language
- text categorization
- machine translation
- document retrieval
- document images