Building Large Scale Text Corpus for Tibetan Natural Language Processing by Extracting Text from Web Pages.
Huidan LiuMinghua NuoJian WuYeping HePublished in: ALR@COLING (2012)
Keyphrases
- text corpus
- natural language processing
- text corpora
- named entities
- computational linguistics
- web pages
- text mining
- text documents
- information extraction
- keywords
- text analysis
- free text
- machine learning
- search engine
- textual data
- web documents
- wordnet
- information retrieval
- artificial intelligence
- question answering
- web search engines
- link structure
- web search
- machine translation
- web mining
- link analysis
- text collections
- wikipedia articles
- knowledge representation
- training corpus
- natural language
- document classification
- text summarization
- information retrieval systems