A New Methodology for Speech Corpora Definition from Internet Documents.
Dominique VaufreydazCarole BergaminiJean-François SerignatLaurent BesacierMohammad AkbarPublished in: LREC (2000)
Keyphrases
- document collections
- document repositories
- information retrieval
- text corpora
- information retrieval systems
- speech recognition
- data collections
- text data
- spoken documents
- web information
- metadata
- topic segmentation
- text documents
- xml documents
- retrieval systems
- text corpus
- keywords
- parallel corpus
- word frequency
- parallel corpora
- text collections
- document clustering
- conceptual model
- document retrieval
- audio visual
- vector space model
- document analysis
- web documents
- digital libraries
- natural language processing
- document classification
- relevant documents
- cd roms
- bibliographic databases
- automatic summarization
- text to speech
- text categorization
- automatic speech recognition
- query terms