Corpus Linguistics for Establishing The Natural Language Content of Digital Library Documents.
Robert P. FutrelleXiaolan ZhangYumiko SekiyaPublished in: DL (1994)
Keyphrases
- natural language
- digital libraries
- metadata
- natural language text
- electronic documents
- digital collections
- digital documents
- web documents
- textual content
- digital objects
- effective retrieval
- digital library systems
- newspaper articles
- document content
- multimedia documents
- scientific papers
- plain text
- document collections
- multimedia
- digital content
- person names
- logical structure
- semantic information
- natural language processing
- semantic analysis
- word frequencies
- information retrieval
- written in natural language
- structured documents
- knowledge representation
- content and structure
- information extraction
- linguistic analysis
- semantic content
- text corpora
- keywords
- document level
- training corpus
- writing style
- user queries
- machine learning
- relevant content
- textual features
- question answering
- natural language questions
- information access
- document representation
- text collections
- multiword
- search interface
- retrieval systems
- text documents
- document retrieval
- text data
- question answering systems
- user generated content