Persica: A Persian corpus for multi-purpose text mining and natural language processing.
Hamid EghbalzadehBehrooz HosseiniShahram KhadiviAli KhodabakhshPublished in: IST (2012)
Keyphrases
- text mining
- natural language processing
- text classification
- broad coverage
- text data
- word sense disambiguation
- information extraction
- text corpora
- text documents
- computational linguistics
- text processing
- entity extraction
- wordnet
- genia corpus
- textual data
- language processing
- reference resolution
- natural language text
- sentiment analysis
- machine learning
- textual documents
- natural language
- artificial intelligence
- semantic analysis
- question answering
- biomedical literature
- coreference resolution
- word sense
- linguistic knowledge
- machine translation
- named entities
- scientific literature
- training corpus
- information retrieval
- free text
- web mining
- topic models
- text clustering
- computational biology
- knowledge discovery
- knowledge representation
- part of speech
- databases
- text retrieval
- manually annotated
- test set
- data analysis
- feature engineering
- data mining