InTeReC: In-text Reference Corpus for Applying Natural Language Processing to Bibliometrics.
Marc BertinIana AtanassovaPublished in: BIR@ECIR (2018)
Keyphrases
- natural language processing
- broad coverage
- free text
- information extraction
- text mining
- text processing
- computational linguistics
- open domain
- text summarization
- information extraction systems
- natural language text
- text understanding
- plain text
- natural language
- entity extraction
- text corpora
- text analysis
- supervised machine learning
- text data
- linguistic analysis
- textual data
- machine translation
- natural language generation
- word sense disambiguation
- reference resolution
- question answering
- machine learning
- named entity recognition
- text retrieval
- linguistic patterns
- information retrieval
- text corpus
- sentence level
- anaphora resolution
- coreference resolution
- word sense
- wordnet
- recognizing textual entailment
- lexical features
- text classification
- manually annotated
- text documents
- artificial intelligence
- knowledge representation
- spontaneous speech
- scientific papers
- reference set
- semantic relations
- english words
- named entity disambiguation
- multiword
- document level
- relation extraction
- newspaper articles
- temporal expressions
- named entities
- semantic analysis
- lexical resources
- world knowledge
- writing style
- text collections
- noun phrases
- linguistic information
- training corpus
- semantic parsing
- statistical natural language processing
- sentiment analysis