TSeg - A Text Segmenter for Corpus Annotation.
Felipe RodriguesRichard SemoliniNorton Trevisan RomanAna María MonteiroPublished in: SBSI (2012)
Keyphrases
- supervised machine learning
- broad coverage
- text data
- natural language text
- plain text
- open domain
- text retrieval
- information retrieval
- annotated corpus
- english words
- scientific papers
- text corpus
- newspaper articles
- semi automatically
- text documents
- lexical features
- active learning
- text collections
- temporal expressions
- sentence level
- keywords
- manually annotated
- recognizing textual entailment
- named entity disambiguation
- document corpus
- linguistic information
- document level
- cross media
- text mining
- image annotation
- free text
- noun phrases
- word pairs
- training corpus
- linguistic features
- metadata
- information extraction systems
- entity extraction
- anaphora resolution
- world knowledge
- document analysis
- text corpora
- multiword
- writing style
- hand crafted
- spontaneous speech
- automatic image annotation
- natural language processing
- information extraction