Self-tuning hyper-parameters for unsupervised cross-lingual tokenization.
Anton KoloninPublished in: CoRR (2023)
Keyphrases
- cross lingual
- hyperparameters
- model selection
- cross validation
- closed form
- bayesian inference
- bayesian framework
- machine translation
- random sampling
- language modeling
- support vector
- maximum likelihood
- prior information
- noise level
- em algorithm
- sample size
- cross language
- unsupervised learning
- incremental learning
- text classification
- maximum a posteriori
- incomplete data
- language model
- supervised learning
- named entities
- semi supervised
- transfer learning
- news articles
- character n grams
- probabilistic model
- document clustering
- missing values
- query translation
- document retrieval
- latent variables
- parameter space
- data sets
- co occurrence
- natural language processing
- information extraction
- high dimensional
- clustering algorithm
- information retrieval