Tabula nearly rasa: Probing the Linguistic Knowledge of Character-Level Neural Language Models Trained on Unsegmented Text.
Michael HahnMarco BaroniPublished in: CoRR (2019)
Keyphrases
- language model
- linguistic knowledge
- document level
- language modeling
- information retrieval
- document retrieval
- retrieval model
- probabilistic model
- text retrieval
- query expansion
- natural language processing
- n gram
- natural language
- word sense
- test collection
- word sense disambiguation
- text mining
- text processing
- noun phrases
- semantic knowledge
- relevance model
- sentence level
- free text
- dialogue system
- pseudo relevance feedback
- vector space model
- cross lingual
- low level
- query terms
- cross language
- natural language text
- document collections
- training set
- information extraction
- training data
- machine learning