UlyssesNER-Br: A Corpus of Brazilian Legislative Documents for Named Entity Recognition.
Hidelberg Oliveira AlbuquerqueRosimeire CostaGabriel SilvestreEllen SouzaNádia Félix F. da SilvaDouglas VitórioGyovana MoriyamaLucas MartinsLuiza SoezimaAugusto NunesFelipe SiqueiraJoão Pedro M. TarregaJoão Vitor P. BeinottiMarcio DiasMatheus SilvaMiguel de Mattos GardiniVinícius Adolfo Pereira da SilvaAndré C. P. L. F. de CarvalhoAdriano L. I. OliveiraPublished in: PROPOR (2022)
Keyphrases
- named entity recognition
- annotated corpus
- information extraction
- named entities
- natural language processing
- natural language text
- text documents
- relation extraction
- linguistic features
- semi supervised
- reference resolution
- named entity disambiguation
- maximum entropy
- information retrieval
- conditional random fields
- text summarization
- genia corpus
- proper names
- information retrieval systems
- noun phrases
- document clustering
- document collections
- text mining
- sentence level
- multiword
- relevant documents
- maximum entropy classifier
- classifier ensemble
- pos tagging
- tf idf
- hidden markov models
- co occurrence
- supervised learning
- semantic information
- keywords