HTML-LSTM: Information Extraction from HTML Tables in Web Pages Using Tree-Structured LSTM.
Kazuki KawamuraAkihiro YamamotoPublished in: DS (2021)
Keyphrases
- information extraction
- web pages
- html documents
- web documents
- semi structured
- structured data
- web information extraction
- data extraction
- website
- web browser
- dom tree
- unstructured text
- free text
- recurrent neural networks
- html pages
- xml files
- web search engines
- web search
- conditional random fields
- hierarchical structure
- keywords
- search engine
- information retrieval
- dynamic content
- relation extraction
- database
- precision and recall
- named entity recognition
- question answering
- web page classification
- plain text
- web server
- tree structures
- natural language processing
- rooted trees
- machine learning
- data records
- text mining
- natural language
- text summarization
- tree structured data
- link analysis
- web data
- tree structure
- document object model