Discovery of Language Resources on the Web: Information Extraction from Heterogeneous Documents.
Viktor PekarRichard EvansPublished in: Lit. Linguistic Comput. (2007)
Keyphrases
- language resources
- web information extraction
- web data
- parallel corpora
- information extraction
- metadata
- machine translation
- information retrieval
- web documents
- cross language information retrieval
- document collections
- text documents
- knowledge discovery
- web pages
- document retrieval
- semi structured
- html documents
- retrieval systems
- structured documents
- xml documents
- document clustering
- web mining
- natural language processing
- broadcast news
- text retrieval
- keywords
- cross lingual
- information retrieval systems
- text categorization
- data mining
- semantic relationships
- web content
- databases
- relevant documents
- speech recognition