Terminology-based Text Embedding for Computing Document Similarities on Technical Content.
Hamid MirisaeeÉric GaussierCédric LagnierAgnès GuerrazPublished in: CoRR (2019)
Keyphrases
- textual content
- text content
- document content
- web documents
- multimedia documents
- pdf files
- content and structure
- scientific papers
- text documents
- content similarity
- related documents
- keywords
- semantic information
- electronic documents
- information retrieval
- document structure
- digital documents
- web pages
- document processing
- textual information
- digital libraries
- structured documents
- document analysis
- semantic content
- multimedia
- text information
- text corpus
- relevant content
- metadata
- textual features
- document clustering
- scientific documents
- xml documents
- effective retrieval
- semantic structure
- printed documents
- similarity measure
- text mining
- text retrieval
- html pages
- relevant documents
- search engine
- document type
- logical structure
- document level
- retrieval systems
- text lines
- user generated content
- document representation
- multimedia content
- news articles
- multimedia data
- vector space
- document images
- domain specific
- information extraction
- wikipedia pages
- cross references