Mining local gazetteers of literary Chinese with CRF and pattern based methods for biographical information in Chinese history.
Chao-Lin LiuChih-Kai HuangHongsu WangPeter K. BolPublished in: IEEE BigData (2015)
Keyphrases
- benchmark datasets
- preprocessing
- computational cost
- information extraction
- information sources
- prior knowledge
- knowledge discovery
- data mining techniques
- maximum entropy
- web corpora
- data sets
- multiple sources
- data mining methods
- named entities
- conditional random fields
- contextual information
- information processing
- text mining
- search engine