Domain adaptation challenges of BERT in tokenization and sub-word representations of Out-of-Vocabulary words.
Anmol NayakHariprasad TimmapathiniKarthikeyan PonnalaguVijendran Gopalan VenkoparaoPublished in: Insights (2020)
Keyphrases
- out of vocabulary
- domain adaptation
- n gram
- named entities
- word segmentation
- named entity recognition
- language model
- spoken document retrieval
- cross language information retrieval
- broadcast news
- semi supervised
- language specific
- cross lingual
- labeled data
- sentiment classification
- cross domain
- parallel corpora
- text classification
- query terms
- transfer learning
- information extraction
- machine translation
- semi supervised learning
- word level
- language modeling
- natural language processing
- part of speech
- sentiment analysis
- co occurrence
- target domain
- unlabeled data
- language independent
- word recognition
- bag of words
- term frequency
- document analysis
- test collection
- question answering
- active learning
- search engine