A Dataset and Strong Baselines for Classification of Czech News Texts.
Hynek KydlícekJindrich LibovickýPublished in: CoRR (2023)
Keyphrases
- benchmark datasets
- pattern recognition
- machine learning
- automatic classification
- feature selection
- decision trees
- pattern classification
- classification method
- classification accuracy
- image classification
- training set
- feature extraction
- supervised learning
- feature set
- text classification
- document classification
- classification scheme
- classification algorithm
- keywords
- data sets
- feature space
- short texts
- classification systems
- fold cross validation
- training dataset
- uci datasets
- information retrieval
- preprocessing
- cross language
- automatically generated
- text retrieval
- news articles
- training data
- decision rules
- multi class
- natural language
- social media