CWID-hi: A Dataset for Complex Word Identification in Hindi Text.
Gayatri VenugopalDhanya PramodRavi ShekharPublished in: LREC (2022)
Keyphrases
- keywords
- indian languages
- sentence level
- english words
- natural language text
- word pairs
- related words
- noun phrases
- database
- text corpus
- statistical machine translation
- string matching
- information retrieval
- text input
- text segments
- english text
- text retrieval
- named entity recognition
- machine translation system
- concept space
- language identification
- word level
- machine translation
- syntactic information
- conditional random fields
- semantic information
- n gram
- lexical features
- topic models
- text mining
- chinese text
- syntactic categories