Robust Document Representations using Latent Topics and Metadata.
Natraj RamanArmineh NourbakhshSameena ShahManuela VelosoPublished in: CoRR (2020)
Keyphrases
- document representation
- latent topics
- metadata
- bag of words
- text documents
- topic models
- digital libraries
- document clustering
- knowledge representation
- topic modeling
- document collections
- latent dirichlet allocation
- n gram
- vector space model
- latent variables
- high dimensional data
- web documents
- information extraction
- multiscale