Title extraction from bodies of HTML documents and its application to web page retrieval.
Yunhua HuGuomao XinRuihua SongGuoping HuShuming ShiYunbo CaoHang LiPublished in: SIGIR (2005)
Keyphrases
- web page retrieval
- html documents
- automatic extraction
- web documents
- web search
- semantic information
- web pages
- language model
- information extraction
- semi structured
- web content
- structured documents
- xml documents
- semistructured data
- machine learning
- natural language
- text classification
- database
- regular expressions
- multimedia