TNT: Text Normalization based Pre-training of Transformers for Content Moderation.
Fei TanYifan HuChangwei HuKeqian LiKevin YenPublished in: EMNLP (1) (2020)
Keyphrases
- text content
- textual content
- cross media
- text information
- web documents
- semantic content
- financial news
- content and structure
- plain text
- metadata
- keywords
- training set
- semantic information
- web content
- training process
- supervised learning
- training examples
- multimedia data
- text retrieval
- textual features
- document content
- web images
- online learning
- document structure
- multimedia documents
- user generated
- textual information
- content features
- audio content
- scientific papers
- user generated content
- database
- key concepts
- free text
- text documents
- text mining
- active learning
- search engine