Detecting Offensive Language in English Hindi and Marathi using Classical Supervised Machine Learning Methods and Word/Char N-grams.
Yaakov HaCohen-KernerMoshe UzanPublished in: FIRE (Working Notes) (2021)
Keyphrases
- machine learning methods
- n gram
- source language
- language specific
- target language
- character n grams
- machine translation
- machine learning
- language model
- statistical machine translation
- word level
- language independent
- cross language information retrieval
- language modeling
- cross language
- machine learning algorithms
- comparable corpora
- indian languages
- text classification
- cross lingual
- machine learning approaches
- variable length
- bag of words
- learning algorithm
- optical character recognition
- word order
- query translation
- parallel corpus
- out of vocabulary
- word segmentation
- natural language
- query terms
- language modelling
- english text
- language identification
- statistical language modeling
- spoken language
- word pairs
- data mining
- information extraction
- foreign language
- machine translation system
- knowledge discovery
- bilingual dictionaries
- semi supervised
- co occurrence
- part of speech
- word sense disambiguation
- retrieval model
- question answering
- document retrieval