Login / Signup
Kitten: a tool for normalizing HTML and extracting its textual content.
Mathieu-Henri Falco
Véronique Moriceau
Anne Vilnat
Published in:
LREC (2012)
Keyphrases
</>
textual content
web pages
news pages
news articles
information extraction
keywords
multimedia
textual information
web search
information retrieval
search engine
image database