Tafsir Dataset: A Novel Multi-Task Benchmark for Named Entity Recognition and Topic Modeling in Classical Arabic Literature.
Sajawel AhmedRob van der GootMisbahur RehmanCarl KruseÖmer ÖzsoyAlexander MehlerGemma RoigPublished in: COLING (2022)
Keyphrases
- named entity recognition
- topic modeling
- multi task
- named entities
- information extraction
- text mining
- topic models
- natural language processing
- text summarization
- learning tasks
- maximum entropy
- text documents
- latent dirichlet allocation
- conditional random fields
- semi supervised
- transfer learning
- text classification
- multi class
- collaborative filtering
- artificial intelligence
- feature selection
- co occurrence
- data sets
- training set
- learning algorithm
- machine learning
- real world
- supervised learning
- knowledge representation
- image segmentation
- information retrieval