An Annotated Corpus of Crime-Related Portuguese Documents for NLP and Machine Learning Processing.
Gonçalo CarnazMário AntunesVítor Beires NogueiraPublished in: Data (2021)
Keyphrases
- machine learning
- annotated corpus
- natural language processing
- named entities
- information extraction
- named entity recognition
- text processing
- free text
- text documents
- text mining
- natural language
- information retrieval
- document collections
- information retrieval systems
- hidden markov models
- metadata
- feature selection
- learning algorithm
- document clustering
- xml documents
- automatic annotation
- text classification
- wordnet
- co occurrence
- machine learning algorithms
- low level