Linguistic complexity: English vs. Polish, text vs. corpus
Jaroslaw KwapienStanislaw DrozdzAdam OrczykPublished in: CoRR (2010)
Keyphrases
- broad coverage
- linguistic information
- natural language text
- open domain
- linguistic analysis
- linguistic features
- natural language processing
- link grammar
- english words
- natural language
- syntactic analysis
- english text
- linguistic patterns
- english language
- multiword
- training corpus
- text data
- mono lingual
- machine translation system
- supervised machine learning
- sentence level
- language generation
- word sense
- text to speech
- text mining
- unknown words
- person names
- text corpus
- text corpora
- machine translation
- stop words
- information retrieval
- wide coverage
- text retrieval
- cross lingual
- relation extraction
- information extraction
- word sense disambiguation
- named entity disambiguation
- native language
- natural language generation
- spontaneous speech
- keywords
- reference resolution
- document corpus
- text documents
- recognizing textual entailment
- text generation
- parse tree
- named entities
- language identification
- parallel corpus