The Long Road Home: conversion and transformation of the Text Creation Partnership corpus.
James CummingsSebastian RahtzPublished in: DH (2013)
Keyphrases
- open domain
- text data
- supervised machine learning
- broad coverage
- lexical features
- natural language text
- free text
- recognizing textual entailment
- document level
- sentence level
- information extraction systems
- text processing
- linguistic information
- text corpora
- text collections
- information retrieval
- newspaper articles
- plain text
- document corpus
- text mining
- textual features
- english words
- scientific papers
- training corpus
- text corpus
- information extraction
- road network
- linguistic patterns
- public private
- anaphora resolution
- web pages
- world knowledge
- multiword
- manually annotated
- text retrieval
- keywords
- temporal expressions
- vehicle detection
- information and communication technologies
- spontaneous speech
- topic segmentation
- entity extraction
- text documents
- information sharing
- text classification
- natural language
- road surface