Tabula nearly rasa: Probing the linguistic knowledge of character-level neural language models trained on unsegmented text.
Michael HahnMarco BaroniPublished in: Trans. Assoc. Comput. Linguistics (2019)
Keyphrases
- language model
- linguistic knowledge
- document level
- information retrieval
- language modeling
- probabilistic model
- n gram
- document retrieval
- natural language
- retrieval model
- natural language processing
- query expansion
- query terms
- test collection
- noun phrases
- pseudo relevance feedback
- machine learning
- word sense disambiguation
- free text
- dialogue system
- word sense
- word segmentation
- text retrieval
- vector space model
- keywords
- training set
- data mining
- recommender systems
- relevance model
- semantic knowledge
- sentence level
- translation model
- semantic information