Processing Internet-derived Text - Creating a Corpus of Usenet Messages.
Sebastian HoffmannPublished in: Lit. Linguistic Comput. (2007)
Keyphrases
- text mining
- electronic mail
- world wide
- text processing
- text messaging
- broad coverage
- newspaper articles
- text data
- text corpora
- supervised machine learning
- sentence level
- real time
- open domain
- scientific papers
- text messages
- recognizing textual entailment
- bulletin board
- natural language text
- data processing
- keywords
- topic segmentation
- lexical features
- instant messaging
- text corpus
- information retrieval
- database
- text collections
- text retrieval
- named entity disambiguation
- linguistic patterns
- textual data
- plain text
- world knowledge
- multiword
- textual features
- communication channels
- internet users
- free text
- word sense
- spontaneous speech
- text documents
- short messages
- information extraction systems
- web documents
- document corpus
- natural language processing
- conversational speech