ViLexNorm: A Lexical Normalization Corpus for Vietnamese Social Media Text.
Thanh-Nhi NguyenThanh-Phong LeKiet NguyenPublished in: EACL (1) (2024)
Keyphrases
- lexical features
- social media
- recognizing textual entailment
- natural language text
- linguistic information
- text corpus
- word pairs
- syntactic features
- word sense
- broad coverage
- textual entailment
- open domain
- text corpora
- wordnet
- word frequency
- named entity recognition
- supervised machine learning
- keywords
- sentence level
- natural language processing
- information extraction
- syntactic information
- syntactic categories
- semantic network
- text mining
- information retrieval
- linguistic analysis
- text data
- text collections
- document corpus
- relation extraction
- lexical information
- newspaper articles
- world knowledge
- natural language
- semantic roles
- semantic relations
- english words
- social media content
- named entity disambiguation
- plain text
- document level
- semantic role labeling
- noun phrases
- topic models
- domain specific
- co occurrence
- social networks