Web-AM: An Efficient Boilerplate Removal Algorithm for Web Articles.
Naseer AslamBilal TahirHafiz Muhammad ShafiqMuhammad Amir MehmoodPublished in: FIT (2019)
Keyphrases
- improved algorithm
- web applications
- website
- detection algorithm
- learning algorithm
- experimental evaluation
- optimization algorithm
- dynamic programming
- np hard
- k means
- semantic web
- cost function
- segmentation algorithm
- preprocessing
- similarity measure
- web pages
- theoretical analysis
- web technologies
- particle swarm optimization
- input data
- web documents
- simulated annealing
- information retrieval
- probabilistic model
- significant improvement
- search space
- lower bound
- computational complexity
- highly efficient
- recognition algorithm
- matching algorithm
- worst case
- clustering method
- computationally efficient
- information sources
- computational cost
- genetic algorithm