An Information Theory-Based Feature Selection Framework for Big Data Under Apache Spark.
Sergio Ramírez-GallegoHéctor Mouriño-TalínDavid Martínez-RegoVerónica Bolón-CanedoJosé Manuel BenítezAmparo Alonso-BetanzosFrancisco HerreraPublished in: IEEE Trans. Syst. Man Cybern. Syst. (2018)
Keyphrases
- information theory
- big data
- feature selection
- information theoretic
- mutual information
- statistical learning
- cloud computing
- open source
- information geometry
- database
- jensen shannon divergence
- massive data
- database systems
- databases
- data processing
- text mining
- text categorization
- social media
- information systems
- information retrieval
- mdl principle
- machine learning
- massive datasets
- conditional entropy
- real world
- shannon entropy