An unsupervised perplexity-based method for boilerplate removal.
Marcos Fernández-PichelManuel de Prada CorralDavid E. LosadaJuan Carlos PichelPablo GamalloPublished in: Nat. Lang. Eng. (2024)
Keyphrases
- high precision
- high accuracy
- image processing
- computational complexity
- preprocessing
- optimization algorithm
- theoretical analysis
- unsupervised learning
- cost function
- mutual information
- fully automatic
- synthetic data
- support vector machine
- experimental evaluation
- neural network
- denoising
- input data
- significant improvement
- prior knowledge
- support vector machine svm
- artificial neural networks
- clustering method
- detection method
- error rate
- multiscale
- data sets