Are the existing training corpora unnecessarily large?
Miguel BallesterosJesús HerreraVirginia FranciscoPablo GervásPublished in: Proces. del Leng. Natural (2012)
Keyphrases
- training corpora
- training corpus
- training data
- text summarization
- parallel corpora
- text classification
- pos taggers
- part of speech
- named entity recognition
- cross language information retrieval
- natural language processing
- machine translation
- data sets
- translation model
- statistical machine translation
- language independent
- information extraction
- data mining