LINA: Identifying Comparable Documents from Wikipedia.
Emmanuel MorinAmir HazemFlorian BoudinElizaveta Loginova ClouetPublished in: BUCC@ACL/IJCNLP (2015)
Keyphrases
- document collections
- wikipedia pages
- wikipedia articles
- information retrieval
- document representation
- information retrieval systems
- relevant documents
- natural language text
- document retrieval
- text retrieval
- document clustering
- text documents
- xml documents
- knowledge base
- document classification
- document analysis
- database
- structured documents
- link structure
- test collection
- probabilistic topic models
- metadata
- keywords
- vector space model
- wordnet
- web documents
- semantic information
- free text
- latent semantic indexing
- semantic relations
- retrieved documents
- query terms
- ad hoc retrieval
- external knowledge
- user queries
- wikipedia categories