ViLexNorm: A Lexical Normalization Corpus for Vietnamese Social Media Text.
Thanh-Nhi NguyenThanh-Phong LeKiet Van NguyenPublished in: CoRR (2024)
Keyphrases
- lexical features
- social media
- recognizing textual entailment
- natural language text
- linguistic information
- text corpus
- word pairs
- broad coverage
- word sense
- syntactic features
- text data
- textual entailment
- natural language processing
- supervised machine learning
- open domain
- keywords
- unknown words
- chinese text
- plain text
- text corpora
- text mining
- information extraction
- linguistic analysis
- newspaper articles
- information retrieval
- multiword
- text documents
- co occurrence
- sentence level
- lexical information
- computational linguistics
- word frequency
- noun phrases
- semantic network
- semantic relations
- english words
- syntactic information
- world knowledge
- social networks
- semantic information
- natural language
- training corpus
- word sense disambiguation
- text collections
- text retrieval
- text classification
- linguistic features
- domain specific
- textual content
- free text
- lexical chains
- named entity recognition
- spontaneous speech
- semantic features
- user generated content
- document representation