Login / Signup

Pre-training Data Quality and Quantity for a Low-Resource Language: New Corpus and BERT Models for Maltese.

Kurt MicallefAlbert GattMarc TantiLonneke van der PlasClaudia Borg
Published in: CoRR (2022)
Keyphrases
  • data quality
  • data warehouse
  • test set
  • data cleaning
  • data transformation
  • quality management
  • database
  • natural language
  • email
  • quality assessment
  • data privacy
  • poor quality
  • data confidentiality