Billions of Parallel Words for Free: Building and Using the EU Bookshop Corpus.
Raivis SkadinsJörg TiedemannRoberts RozisDaiga DeksnePublished in: LREC (2014)
Keyphrases
- english words
- word frequencies
- text corpora
- multiword
- text corpus
- training corpus
- word pairs
- noun phrases
- document level
- parallel processing
- shared memory
- person names
- textual features
- n gram
- linguistic information
- related words
- lexical features
- stop words
- word frequency
- unknown words
- information retrieval
- semantic roles
- parallel algorithm
- world knowledge
- parallel corpus
- text documents
- word recognition
- text categorization
- spontaneous speech
- manually annotated
- word co occurrence
- language model