A Dictionary- and Corpus-Independent Statistical Lemmatizer for Information Retrieval in Low Resource Languages.
Aki LoponenKalervo JärvelinPublished in: CLEF (2010)
Keyphrases
- information retrieval
- machine readable dictionaries
- wide coverage
- statistical machine translation
- training corpus
- bilingual dictionaries
- cross lingual
- expressive power
- multi lingual
- parallel corpus
- statistical analysis
- text corpus
- lexical knowledge
- machine translation
- text mining
- language independent
- language modeling
- information retrieval systems
- text corpora
- search engine
- parallel corpora
- text collections
- document collections
- information access
- resource allocation
- computational linguistics
- multilingual information retrieval
- sparse representation
- statistical models
- manually annotated
- vector space model
- document retrieval
- wordnet
- topic detection and tracking
- document corpus
- language model
- information extraction
- probabilistic model