Handling tree-structured text: parsing directory pages.
Sarang ShrivastavaAfreen ShaikhShivani ShrivastavaChung Ming HoPradeep ReddyVijay SaraswatPublished in: CoRR (2021)
Keyphrases
- page layout
- keywords
- textual content
- content features
- linguistic analysis
- hierarchical structure
- printed text
- web pages
- unstructured text
- anchor text
- website
- structured data
- tree structure
- information retrieval
- web documents
- search engine
- syntactic categories
- text mining
- html pages
- tree structured data
- parse tree
- natural language processing
- syntactic analysis
- text retrieval
- syntactic structures
- natural language
- information extraction
- metadata
- semantic parsing
- web users
- database
- broad coverage
- textual information