Anonymization of German financial documents using neural network-based language models with contextual word representations.
David BiesnerRajkumar RamamurthyRobin StenzelMax LübberingLars Patrick HillebrandAnna LadiMaren PielkaRüdiger LoitzChristian BauckhageRafet SifaPublished in: Int. J. Data Sci. Anal. (2022)
Keyphrases
- language model
- word clouds
- document retrieval
- n gram
- context sensitive
- language modeling
- document level
- multiword
- ad hoc information retrieval
- information retrieval
- query terms
- vector space model
- probabilistic model
- language modeling approaches
- document ranking
- cross language
- speech recognition
- translation model
- statistical language models
- relevance model
- retrieval model
- test collection
- document representation
- query expansion
- query specific
- out of vocabulary
- statistical language modeling
- term weighting
- text retrieval
- pseudo relevance feedback
- retrieved documents
- term dependencies
- smoothing methods
- document length
- information retrieval systems
- expert search
- web documents
- pseudo feedback
- word segmentation
- language independent