Extracting representative subset from extensive text data for training pre-trained language models.
Jun SuzukiHeiga ZenHideto KazawaPublished in: Inf. Process. Manag. (2023)
Keyphrases
- language model
- text data
- pre trained
- representative subset
- training examples
- text mining
- text classification
- n gram
- probabilistic model
- document collections
- information retrieval
- training data
- speech recognition
- query expansion
- high dimensional data
- training set
- structured data
- high dimensional
- text documents
- data sets
- supervised learning
- data reduction
- machine learning
- real world
- knowledge discovery
- question answering
- active learning
- action recognition
- named entities