A document classifier for medicinal chemistry publications trained on the ChEMBL corpus.
George PapadatosGerard J. P. van WestenSamuel CrosetRita SantosSimone TrubianJohn P. OveringtonPublished in: J. Cheminformatics (2014)
Keyphrases
- training set
- training documents
- svm classifier
- document corpus
- text classifiers
- training process
- document collections
- test set
- supervised training
- text corpus
- document level
- dependent features
- classification algorithm
- text categorization
- support vector
- training data
- text classification
- scientific papers
- class labels
- document classification
- support vector machine
- multi layer perceptron
- feature selection
- text documents
- retrieval systems
- document images
- information retrieval
- feature set
- information retrieval systems
- keywords
- digital libraries
- feature space
- noun phrases
- document clustering
- lexical features
- vector space model
- similar documents
- word sense
- learning algorithm
- decision trees
- multiword
- text corpora
- training samples
- support vector machine svm
- text collections
- web documents
- text data
- semantic information
- multilayer perceptron
- document retrieval