The century of prose corpus: A half-million word historical data base.
Louis T. MilicPublished in: Comput. Humanit. (1995)
Keyphrases
- news corpus
- word frequencies
- text corpus
- word sense
- sentence level
- english words
- word pairs
- multiword
- database
- training corpus
- unknown words
- noun phrases
- co occurrence
- lexical features
- natural language text
- statistical machine translation
- word co occurrence
- news articles
- historical manuscripts
- parallel corpus
- named entities
- spontaneous speech
- word frequency
- n gram
- text corpora
- historical data
- george washington
- sentence pairs
- linguistic information
- word sense disambiguation
- databases
- parallel corpora
- stop words
- machine translation system
- document level
- word segmentation
- ambiguous words
- part of speech
- conversational speech
- st century
- cross lingual
- text mining