Machine Learning for High-Quality Tokenization Replicating Variable Tokenization Schemes.
Murhaf FaresStephan OepenYi ZhangPublished in: CICLing (1) (2013)
Keyphrases
- machine learning
- high quality
- named entities
- biomedical text
- biomedical information retrieval
- information extraction
- natural language processing
- learning systems
- pattern recognition
- feature selection
- text mining
- ground truth
- learning algorithm
- active learning
- natural language
- systematic evaluation
- computer vision
- explanation based learning
- low quality
- genetic algorithm
- character n grams
- machine learning algorithms
- neural network
- artificial intelligence
- knowledge representation
- image quality
- high resolution
- knowledge base
- decision trees
- bayesian networks
- supervised machine learning
- data analysis
- machine learning approaches
- computational biology
- semantic network
- machine learning methods
- knowledge discovery
- database
- query expansion
- knowledge acquisition