Information extraction from multimedia web documents: an open-source platform and testbed.
David Paul DupplawMichael MatthewsRichard JohanssonGiulia BoatoAndrea CostanzoMarco FontaniEnrico MinackElena DemidovaRoi BlancoThomas GriffithsPaul H. LewisJonathon S. HareAlessandro MoschittiPublished in: Int. J. Multim. Inf. Retr. (2014)
Keyphrases
- web documents
- information extraction
- multimedia
- semi structured
- natural language processing
- web search engines
- information retrieval
- text mining
- html documents
- named entity recognition
- unstructured documents
- multimedia data
- named entities
- document classification
- relation extraction
- question answering
- web data
- multimedia content
- unstructured text
- machine learning
- metadata
- web content
- text documents
- structured data
- document representation
- focused crawling
- vector space model
- textual data
- natural language
- web information extraction
- textual information
- link structure
- digital libraries
- data extraction
- web directories
- databases