Information Extraction from HTML: Combining XML and Standard Techniques for IE from the Web.
Luo XiaoDieter WissmannMichael BrownStefan JablonskiPublished in: IEA/AIE (2001)
Keyphrases
- information extraction
- data interchange
- semi structured
- web documents
- extensible markup language
- web mining
- textual data
- xml documents
- semi structured data
- web pages
- data extraction
- structured data
- web data
- html documents
- xml data
- precision and recall
- website
- xml schema
- text mining
- free text
- natural language processing
- xml technology
- named entity recognition
- lingua franca
- data exchange
- standard for data exchange
- information retrieval
- machine learning
- question answering
- web applications
- data model
- xml files
- xml databases
- semistructured data
- xml queries
- html pages
- interchange format
- relational data
- topic maps
- database
- web content
- text documents
- data integration
- databases
- structured documents
- keyword queries
- web resources
- xml format
- document centric
- semantic web
- information retrieval systems
- natural language
- metadata