Generating Hypermedia Documents from Transcriptions of Television Programs Using Parallel Text Alignment.
David C. GibbonPublished in: RIDE (1998)
Keyphrases
- spoken documents
- text documents
- free text
- information retrieval
- web documents
- digital documents
- text retrieval
- document processing
- textual content
- textual data
- keywords
- document analysis
- text content
- text analysis
- document collections
- text clustering
- latent semantic analysis
- plagiarism detection
- newspaper articles
- text files
- text information
- digital libraries
- text collections
- document content
- text mining
- electronic documents
- multimedia documents
- automatic categorization
- spoken document retrieval
- document clustering
- natural language text
- textual information
- document categorization
- information retrieval systems
- topic segmentation
- information extraction
- page layout
- printed documents
- journal articles
- handwritten text
- related documents
- text categorization
- word level
- semantic information
- multimedia
- document level
- relevant documents
- multiword
- database
- xml documents
- metadata
- natural language processing
- co occurrence
- text corpus
- scientific literature
- handwritten documents
- document retrieval
- text classifiers
- text data
- document structure
- broadcast news
- word pairs
- text classification
- user queries
- query terms
- key concepts
- document representation