Terminology-based Text Embedding for Computing Document Similarities on Technical Content.
Hamid MirisaeeÉric GaussierCédric LagnierAgnès GuerrazPublished in: PFIA (TIA) (2019)
Keyphrases
- textual content
- text content
- document content
- web documents
- multimedia documents
- content and structure
- scientific papers
- pdf files
- semantic information
- text documents
- content similarity
- document structure
- information retrieval
- related documents
- electronic documents
- relevant content
- web pages
- document representation
- keywords
- digital documents
- textual information
- semantic content
- document processing
- document analysis
- vector space
- metadata
- multimedia
- text corpus
- html pages
- text information
- domain specific
- structured documents
- semantic structure
- scientific documents
- information retrieval systems
- news articles
- textual features
- automatic text summarization
- keyword extraction
- search engine
- user generated content
- document clustering
- document images
- similarity measure
- printed documents
- document set
- relevant documents
- document collections
- text fragments
- text mining
- information extraction
- digital libraries
- wikipedia pages