Language Identifications of Arabic Script Web Documents Using Independent Component Analysis.
Ali SelamatZhi-Sam LeePublished in: Asia International Conference on Modelling and Simulation (2008)
Keyphrases
- web documents
- information extraction
- semi structured
- document classification
- arabic language
- web search engines
- natural language
- web pages
- script language
- language identification
- web content
- vector space model
- html documents
- textual information
- web data
- keywords
- indian languages
- document representation
- link structure
- structured documents
- n gram
- focused crawling
- search engine
- dynamically generated
- text mining
- unstructured documents