TDDC: Timely Disclosure Documents Corpus.
Nobushige DoiYusuke OdaToshiaki NakazawaPublished in: LREC (2020)
Keyphrases
- word frequencies
- newspaper articles
- person names
- document collections
- similar documents
- information retrieval systems
- text data
- document level
- multiword
- information retrieval
- text corpus
- web documents
- document corpus
- text documents
- training corpus
- text collections
- manually annotated
- xml documents
- document classification
- relevant documents
- linguistic information
- keywords
- text corpora
- word frequency
- information loss
- training documents
- document retrieval
- natural language text
- privacy protection
- free text
- document clustering
- user queries
- semantic information
- metadata
- wikipedia articles
- plain text
- parallel corpus
- lda model
- co occurrence
- word co occurrence