A hadoop based platform for natural language processing of web pages and documents.
Paolo NesiGianni PantaleoGianmarco SanesiPublished in: J. Vis. Lang. Comput. (2015)
Keyphrases
- natural language processing
- web pages
- web documents
- free text
- information extraction
- keywords
- textual content
- google search engine
- plain text
- linguistic analysis
- textual data
- information retrieval
- search engine
- web data
- portuguese language
- website
- natural language
- xml documents
- web search engines
- computational linguistics
- machine learning
- html pages
- wordnet
- textual information
- information retrieval systems
- structured information
- text mining
- text documents
- document collections
- text processing
- open source
- artificial intelligence
- cloud computing
- web page classification
- structured data
- vector space model
- document retrieval
- semi structured
- semantic relations
- topic specific
- distributed systems
- web content
- relevant documents
- knowledge representation
- semantic information
- dynamically generated
- textual contents
- named entities
- question answering
- co occurrence
- named entity recognition
- web search
- web graph
- ranking list
- part of speech
- link analysis