Improved Implementation for Finding Text Similarities in Large Sets of Data - Notebook for PAN at CLEF 2011.
Ján GrmanRudolf RavasPublished in: CLEF (Notebook Papers/Labs/Workshop) (2011)
Keyphrases
- data sets
- database
- data collection
- data analysis
- training data
- data quality
- raw data
- textual data
- information retrieval
- probability distribution
- text classification
- data processing
- computer systems
- synthetic data
- web documents
- text data
- text retrieval
- original data
- text documents
- spatial data
- missing data
- question answering
- high dimensional data
- machine learning
- input data
- knowledge discovery
- data points
- data structure
- high quality