Ubiq: a platform for crawling, analyzing and exploiting corpora (Ubiq : une plateforme de collecte, analyse et valorisation des corpus) [in French].
François-Régis ChaumartinPublished in: TALN (3) (2014)
Keyphrases
- text corpora
- annotated corpus
- wide coverage
- web pages
- text data
- parallel corpus
- statistical machine translation
- document corpus
- search engine
- text corpus
- real time
- topic segmentation
- text collections
- web mining
- relation extraction
- named entities
- manually annotated
- training corpus
- text mining
- information retrieval
- linguistic patterns
- word frequencies
- specific domains
- web crawlers
- comparable corpora
- noun phrases
- test set
- natural language processing
- machine learning