Login / Signup

Perplexed by Quality: A Perplexity-based Method for Adult and Harmful Content Detection in Multilingual Heterogeneous Web Data.

Tim JansenYangling TongVictoria ZevallosPedro Ortiz Suarez
Published in: CoRR (2022)
Keyphrases
  • web data
  • detection method
  • information retrieval
  • database
  • databases
  • machine learning
  • digital libraries
  • relational databases
  • language model
  • web mining
  • data sets
  • metadata
  • multimedia
  • semi structured