Synthetic Data for English Lexical Normalization: How Close Can We Get to Manually Annotated Data?
Kelly DekkerRob van der GootPublished in: LREC (2020)
Keyphrases
- synthetic data
- manually constructed
- lexical information
- wordnet
- real world
- real image data
- english language
- linguistic analysis
- data sets
- automatically generated
- natural language
- bilingual dictionaries
- domain specific
- word sense disambiguation
- machine translation
- word sense
- normalization method
- unknown words
- parse tree
- synthetic datasets
- language learning
- semantic roles
- keywords
- cross language
- mri data
- language independent
- cross lingual
- semantic network
- natural language processing
- query translation
- semantic relations
- co occurrence
- chinese text